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Introduction. 

There  is  general  agreement  that  in  all  eukaryotes,  phosphorylation  by  various 
cyclin-Cdk  complexes  controls  and  orchestrates  key  cell  cycle  events.  These  events 
include  commitment  in  G1  phase,  initiation  of  DNA  synthesis  in  S  phase,  and  spindle 
formation  and  elongation  in  mitosis.  However,  despite  knowing  a  great  deal  about  the 
cyclin-Cdk  complexes  themselves,  and  despite  years  of  investigation  by  many 
laboratories,  we  know  only  about  half  a  dozen  substrates  of  the  cyclin-Cdk  kinases,  and 
none  of  these  explain  the  control  of  critical  cell  cycle  events.  In  particular,  we  do  not 
know  what  substrates  have  to  be  phosphorylated  for  commitment  to  occur  (although  in 
mammalian  cells,  Rb  is  almost  certainly  one  of  the  substrates). 

The  purpose  of  the  present  work  is  to  develop  methods  for  identifying  substrates 
of  the  cyclin-Cdk  complexes.  We  are  especially  interested  in  G1  substrates. 

We  initially  proposed  two  main  approaches.  The  first  approach  used  two- 
dimensional  gels  to  examine  phosphoproteins.  Various  cyclins  were  expressed  from  a 
GAL  promoter,  and  cells  with  the  over-expressed  cyclins  were  labeled  with  The 
patterns  of  spots  on  2D  gels  was  then  compared  between  cells  expressing  and  not 
expressing  the  cyclin.  Extra  spots  in  the  cyclin-expressing  cells  may  be  substrates. 

The  second  approach  was  to  develop  antibodies  against  phosphoserine  followed 
by  proline,  and  phosphothreonine  followed  by  proline.  Such  antibodies  would  recognize 
proteins  phosphorylated  by  Cdk  complexes.  This  would  greatly  aid  in  the  identification 
and  characterization  of  substrates. 

Body  of  the  Report. 

Aim  1.  Visualization  of  Cdc28  substrates  on  2D  gels. 

i.  Experiments  with  2D  gels.  We  have  done  a  large  number  of  experiments 
examining  yeast  phosphoproteins  on  2D  gels.  We  optimized  conditions  for  labeling  cells, 
preparing  extracts,  and  running  gels.  One  manuscript  including  some  of  this  work  is  in 
press  (Futcher  et  ah,  see  Appendix).  It  has  proven  extremely  difficult  to  find  spots  that 
are  specifically  phosphorylated  by  cyclin  dependent  kinases;  the  proteins  that  are 
phosphorylated  in  this  way  are  non-abundant,  and  so  the  spots  are  relatively  weak.  This 
is  not  exactly  a  problem  of  sensitivity;  rather,  the  problem  is  that  the  2D  gels  contain 
numerous  spots  that  are  intensely  labeled,  and  these  obscure  nearby  weaker  spots.  At 
exposures  that  would  allow  us  to  see  the  weaker  spots,  the  whole  film  has  long  since 
turned  black.  This  problem,  which  we  term  “coverage”  has  limited  the  usefulness  of  this 
approach. 

Nevertheless,  by  doing  many  experiments  and  many  exposures,  we  were  able  to 
see  (at  least  in  some  experiments)  weak  spots  that  seemed  to  be  Gl-  or  S-phase  substrates 
of  Cdc28  kinase.  By  examination  of  the  molecular  weight  and  isoelectric  point  of  the 
spots,  we  guessed  that  they  might  be  Sid  (an  inhibitor  of  Clb-Cdc28  kinases),  Mcm3, 
and  Mcm4  (=  Cdc54).  Identification  of  these  phosphorylated  spots  was  Aim  3  of  the 
Statement  of  Work.  We  have  studied  all  of  these  substrates,  most  intensively  Sicl. 

ii.  Identification  of  Sicl  as  a  Cln  substrate.  Cells  over-expressing  CLB2  and 
labeled  with  gave  2D  gels  showing  a  faint  series  of  spots  with  the  molecular  weight  of 
Sicl.  Since  it  was  already  suspected  for  other  reasons  that  Sicl  might  be  a  substrate  of 
Cdc28  complexes,  we  investigated  Sicl.  Initially,  we  obtained  purified  Sicl  from  Dr.  M. 
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Mendenhall,  and  did  in  vitro  kinase  assays  using  different  cyclin-Cdc28  complexes.  We 
found  that  all  tested  cyclin-Cdc28  complexes,  (Clnl,  Cln2,  Cln3,  Clbl,  Clb2,  and  Clb5 
complexes)  could  phosphorylate  Sicl  very  well  in  vitro  at  low  concentrations  of  Sicl 
(Fig.  1).  At  higher  concentrations  of  Sicl,  the  Clbl,  Clb2,  and  Clb5  complexes  were 
inhibited  as  histone  HI  kinases,  but  Clnl,  Cln2,  and  Cln3  complexes  were  not.  When 
cyclins  were  immunoprecipitated  from  such  kinase  reactions,  substantial  amounts  of  Sicl 
co-precipitated  with  Clbl,  Clb2,  and  Clb5,  but  little  or  no  Sicl  co-precipitated  with  Clnl, 
Cln2  or  Cln3.  We  conclude  from  this  that  Sicl  binds  tightly  to  the  Clb-Cdc28 
complexes,  but  not  to  the  Cln-Cdc28  complexes.  This  tight  binding  to  the  Clb-Cdc28 
complexes  may  be  the  reason  that  Sicl  inhibits  these  complexes. 


Fig.  1.  Sicl  and  Mcm3  are  phosphory- 
lated  by  Cln-  and  Clb-Cdc28  kinases. 

A  (Left).  0.4  |Xg  of  Sicl,  0.4  pg  of 
Mcm3,  and  1  pg  of  histone  HI  and  ‘^^P- 
ATP  were  mixed  with  Clnl-,  Cln2-,  or 
Cln3-Cdc28  complexes.  (Right)  0.4  pg 
of  Mcm3  and  0.4  pg  of  Sicl  and  ^^P-ATP 
were  mixed  with  Clbl-  or  Clb2-Cdc28 
kinase.  Results  were  visualized  by 
autoradiography.  Similar  results  were 
obtained  when  substrates  were  used 
separately. 

B.  The  cyclins  used  in  part  A  were 
visualized  by  Western  blotting. 


We  tried  similar  experiments  with  p21,  a  human  protein  that  inhibits  human  G1 
cyclin/CDK  complexes.  Perhaps  surprisingly,  human  p21  was  a  specific  inhibitor  of 
yeast  Clnl,  Cln2,  and  Cln3-Cdc28  complexes,  but  did  not  inhibit  Clbl,  Clb2,  or  Clb5 
complexes  (at  least  at  the  concentrations  we  tried).  Once  again,  tight  binding  (as  shown 
by  co-immuno-precipitation)  correlated  with  inhibition.  It  has  been  suggested  that  in 
human  cells,  p21  discriminates  between  different  cyclin-CDK  complexes  by  recognizing 
the  CDK  component.  Our  results  with  Cdc28  complexes  show  that  p21  must  also 
recognize  the  cyclin  component,  since  the  CDK  was  constant  in  our  experiments. 
Furthermore,  the  features  the  human  inhibitor  recognizes  in  human  G1  cyclin-Cdk 
complexes  seem  to  be  preserved  in  the  yeast  complexes. 

When  Sicl,  Clb5-Cdc28,  and  ^^P-ATP  were  mixed  in  a  kinase  reaction,  and  the 
products  were  run  on  2D  gels,  we  could  see  at  least  5  phosphorylated  forms  of  Sicl 
spread  out  in  the  isoelectric  focusing  dimension  (Fig.  2).  These  presumably  are  Sicl  with 
1,  2,  3,  4,  or  5  phosphates.  We  have  evidence  that  many  if  not  all  of  these  also  occur  in 
vivo.  After  long  incubations  with  Clb-Cdc28  kinase,  as  many  as  13  different  isoforms 
can  be  seen  (data  not  shown). 
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Fig.  2.  Sicl  is  phosphorylated  on  multiple  sites  by  Clb5- 
Cdc28.  0.4  pg  of  Sicl  was  mixed  with  Clb5-Cdc28  and  ^^P- 
ATP.  Products  were  run  on  2D  gels  (acidic  isoelectric  points 
shown  to  the  right)  and  imaged  by  autoradiography.  Each 
spot  represents  a  different,  ^^P-labeled  charge  isoform. 


With  this  preliminary  evidence  in  hand,  we  went  on  to  characterize  the 
relationship  of  Sicl,  Clns,  and  Start.  This  work  has  been  published  in  Science  (Schneider 
et  al.  1996),  and  a  reprint  is  attached,  so  the  results  in  that  paper  are  described  below  only 
briefly.  We  showed  that  Sicl  was  a  phosphoprotein  in  vivo.  We  showed  that  in  vivo, 
hyper-phosphorylated  forms  of  Sicl  depended  on  induction  of  Clns.  We  showed  that 
degradation  of  Sicl  depended  on  the  ubiquitin-conjugating  enzyme,  and  also  showed  that 
degradation  also  depended  on  Clns.  This  suggested  that  Cln-Cdc28  complexes  directly 
phosphorylate  Sicl,  and  that  this  phosphorylation  then  allows  phosphorylated  Sicl  to  be 
degraded  via  Cdc34  and  the  ubiquitin  pathway  (Schneider  et  al.  1996). 

Genetic  experiments  supported  this  idea.  Most  strikingly,  a  sicl  deletion 
suppressed  the  lethality  of  the  clnl  cln2  cln3  triple  deletion  strain.  This  suggests  that  an 
essential  function  of  the  Clns  is  promoting  destruction  of  Sicl  (Schneider  et  al.  1996). 
More  recently,  biochemical  experiments  of  Deshaies  and  co-workers  have  added  strong 
support  to  the  idea  that  phosphorylation  of  Sicl  by  Cln-Cdc28  is  a  pre-requisite  for 
ubiquitin-mediated  destruction  of  Sicl,  and  subsequent  progression  into  S-phase  (Verma 
et  al.  1997). 

Sicl  is  an  inhibitor  of  Clb-Cdc28  complexes,  and  the  earliest  cell  cycle  roles  of 
these  kinases  are  spindle  formation  and  initiation  of  S-phase.  We  demonstrated  that  in  a 
sicl  null  mutant,  S-phase  was  advanced  with  respect  to  budding,  and  the  timing  of  S- 
phase  was  now  almost  independent  of  Cln  expression  (Schneider  et  al.  1996).  These  two 
results  suggest  that  S-phase  is  normally  linked  to  Start  via  Sicl.  In  the  absence  of  Sicl, 
S-phase  becomes  independent  of  Start  and  independent  of  Clns. 

Our  work  on  Sicl  has  been  extended  into  mammalian  cells  by  others.  In 
mammalian  cells,  the  CDK  inhibitor  p27  is  at  least  somewhat  analogous  to  Sicl.  It  has 
been  shown  that  p27  is  phosphorylated  by  cyclin-CDK  activity,  and  that  this 
phosphorylation  is  essential  for  the  degradation  of  p27  (Sheaff  et  al.  1997;  Montagnoli  et 
al.  1999;  Morisaki  et  al.  1997.  In  this  respect,  the  control  of  human  p27  seems  to  be 
exactly  analogous  to  the  control  of  yeast  Sicl. 

iii.  Investigation  of  Potential  Substrates  involved  in  Replication.  As 
described  above,  Mcm3  and  Mcm4  (=  Cdc54)  are  potential  substrates  identified  in  the  2D 
gel  system.  .  Mcm3  and  Mcm4  are  both  “Mem”  (Mini-Chromosome  Maintenance) 
genes.  There  are  six  different  Mem  proteins  in  yeast,  each  of  which  is  essential  for 
viability.  The  six  Mems  are  structurally  related  to  each  other;  and  there  are  very  closely 
related  homologs  in  other  eukaryotes,  including  humans,  and  also  in  Archebacteria.  The 
Mem  proteins  are  required  for  DNA  replication  (reviewed  by  Tye,  1994).  Recent 
research  suggests  that  the  Mems  form  a  complex  (a  hexamer?)  at  origins  of  replication. 
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and  when  replication  begins,  these  complexes  move  away  from  the  origins  with  the  forks, 
possibly  acting  as  helicases  (reviewed  by  Leatherwood,  1998). 

To  gain  further  evidence  that  Mcm3  might  be  a  substrate  of  Cdc28,  we  labeled 
cells  with  '^^P,  immunoprecipitated  Mcm3  from  the  extracts  with  an  anti-Mcm3 
monoclonal  antibody,  and  analysed  the  results  on  2D  gels.  These  showed  five  isoforms 
of  Mcm3  (Fig.  3a).  These  isoforms  were  absent  in  a  cdc28-4  strain  at  the  restrictive 
temperature  (Fig.  3b),  strongly  suggesting  that  Mcm3  is  indeed  phosphorylated  in  vivo  in 
a  CZ)C25-dependent  way.  Furthermore,  we  obtained  purifed,  soluble  Mcm3  protein  from 
baculovirus  infected  cells,  mixed  it  with  active  Cdc28  kinase  and  ■'*^P-ATP  in  vitro,  and 
analyzed  the  products  using  2D  gels.  We  obtained  a  set  of  phosphorylated  spots  similar 
(but  not  identical)  to  those  seen  in  vivo  (Fig.  3c).  We  also  showed  that  ]VIcm3  was  a  good 
substrate  for  both  Cln  and  Clb  forms  of  Cdc28  (Fig.  1).  In  the  case  of  Mcm4,  we  have  no 
specific  antibody,  so  equivalent  experiments  have  not  yet  been  done.  However,  it  is 
known  that  the  Xenopus  Mcm4  homolog  can  be  phosphorylated  by  Cdc2  (the  Cdc28 
homolog)  (Hendrickson  et  al.  1996). 


. 

/  \ 

Mcm3, 00028"^  Mcm3,  cdc28 

Mcm3,  in  vitro 

Fig.  3.  (left  and  middle)  Cells  were  labeled  with  ^^P,  extracts  were  made,  and 
immunoprecipitated  with  anti-Mcm3  monoclonal  antibody.  Immunoprecipitates  were 
run  on  2D  gels.  On  the  left,  the  strain  was  wild-type  CDC28  grown  at  37°C;  in  the 
middle,  the  strain  was  cdc28-4  grown  at  37°C.  The  spots  seen  on  the  left  migrated  at  the 
appropriate  pi  and  mass  for  Mcm3,  and  were  not  seen  in  a  control  experiment  lacking 
antibody.  (Right)  Purified  Mcm3  phosphorylated  in  vitro  with  Cln2-Cdc28. 


Both  Mcm3  and  Mcm4  have  clustered  potential  Cdc28  phosphorylation  sites. 
The  Cdc28  phosphorylation  sites  in  Mcm4  are  highly  conserved,  and  for  instance  are 
found  in  the  Mcm4  homologs  in  S.  pombe,  Xenopus,  and  humans,  suggesting  these  sites 
may  be  very  important.  In  other  work,  we  have  mutated  the  conserved  Cdc28 
phosphorylation  sites  in  Mcm3  and  in  Mcm4.  Although  these  mutations  have  little 
phenotype  on  their  own,  they  give  lethality  or  temperature-sensitive  lethality  when 
combined  with  certain  phosphorylation  site  mutations  in  other  replication  proteins  such 
as  Orc2.  For  instance,  an  orc2*  mcm4*  mutant  is  a  ts  lethal,  and  an  orc2*  mcm4*  orc6* 
triple  mutant  is  an  unconditional  lethal  (where  indicates  a  mutant  lacking  all  Cdc28 
consensus  phosphorylation  sites).  We  are  still  pursuing  this  work,  but  at  present  it 
appears  that  Mcm4  at  least  is  an  important  substrate  of  Cdc28  in  yeast,  and  may  well  be 
an  important  substrate  in  human  cells. 
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Aim  2.  Develop  antibodies  against  phosphoSer-Pro  and  phosphoThr-Pro. 

Because  of  the  difficulty  in  seeing  spots  phosphorylated  by  Cdc28  (not  to  mention 
the  difficulty  in  purifying,  identifying,  and  analysing  them),  we  would  like  antibodies  that 
could  specifically  recognize  the  phosphorylated  forms  of  Cdc28  substrates.  Since  Cdc28 
almost  always  phosphorylates  a  serine  or  threonine  followed  by  a  proline  (i.e.,  SP  or  TP) 
we  would  like  antibodies  directed  against  phospho-S-P  and  phospho-T-P.  Good 
antibodies  have  been  made  against  phospho-tyrosine  (Ross  et  al.  1981),  and  some 
antibodies  have  been  made  against  phospho-threonine  (Heffetz  et  al.  1989).  The  extra 
proline  should  make  for  a  much  better  epitope  than  phospho-Ser  or  phospho-Thr  alone. 
We  made  a  set  of  peptides  to  make  the  antibodies  desired.  The  peptides  were; 

1. CGGpSPGGK 

2. RAApSPAAC 

3.  CNNpSPNNH 


4.  CGGpTPGGK 

5. RAApTPAAC 
b.CNNpTPNNH 

Each  peptide  was  conjugated  (through  the  terminal  cys  residue)  to  three  different 
carrier  proteins:  keyhole  limpet  hemocyanin,  chicken  ovalbumin,  and  bovin  serum 
albumin.  The  immunization  strategy  to  get  (e.g.)  anti-phospho-Ser-Pro  antibodies  was  to 
do  cycles  of  immunizations  with  peptides  1  and  2  coupled  to  different  carriers.  Peptide  1 
coupled  to  KLH  was  the  primary  antigen;  peptide  2  coupled  to  BSA  was  the  first  boost; 
peptide  1  coupled  to  BSA  was  the  second  boost;  peptide  2  coupled  to  KLH  was  the  third 
boost;  peptide  1  coupled  to  BSA  was  the  fourth  boost;  and  finally  peptide  2  coupled  to 
KLH  was  the  fifth  boost.  Note  that  because  of  the  different  peptide  and  carrier 
sequences,  the  only  thing  in  common  between  all  the  antigens  was  phospho-Ser-Pro.  A 
similar  series  of  immunizations  were  done  with  a  second  set  of  mice  using  the  phospho¬ 
Thr-Pro  peptides. 

In  our  first  attempt,  mice  achieved  high  serum  titres  of  antibody  against  both  the 
phospho-Ser-Pro  peptides,  and  the  phospho-Thr-Pro  peptides.  Unfortunately,  all  six 
immunized  mice  then  died  quite  suddenly  just  before  fusions  were  done,  and  we  had  to 
start  again.  In  the  second  attempt,  all  three  phospho-Ser-Pro  mice  survived,  fusions  were 
done,  and  we  got  what  appear  to  be  excellent  monoclonal  antibodies  (see  below). 
However,  two  of  three  of  the  phospho-Thr-Pro  mice  died,  and  the  surviving  mouse  had 
the  lowest  serum  titres  of  antibody.  This  mouse  was  fused,  and  we  may  have  obtained 
good  monoclonal  antibodies.  It  is  possible  that  antibodies  to  these  antigens  are  toxic. 

Monoclonal  lines  from  the  fusions  were  initially  screened  against  peptide  2 
coupled  to  KLH,  which  was  the  antigen  used  in  the  final  boost.  Not  surprisingly,  we  got 
many  positive  lines.  However,  all  positive  lines  were  then  screened  against  peptide  1 
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coupled  to  BSA,  and  finally,  all  remaining  positive  lines  were  screened  against  peptide  3 
coupled  to  ovalbumin.  We  obtained  three  strongly  positive  cell  lines  (i.e.,  lines 
producing  an  antibody  that  reacts  with  all  three  test  antigens).  Since  neither  peptide  3  nor 
ovalbumin  was  ever  used  to  immunize  these  mice,  the  fact  that  antibodies  react  with  it  is 
a  strong  sign  that  the  correct  specificity  (phospho-Ser-Pro)  has  in  fact  been  achieved. 
Furthermore  the  antibodies  also  react  with  peptide  alone  (on  plastic  substratum);  this 
excludes  the  possibility  that  the  chemical  agent  used  to  couple  the  peptides  to  carrier 
protein  has  generated  the  epitope  with  which  the  antibodies  are  reacting. 

We  are  still  in  the  process  of  characterizing  these  antibodies,  but,  at  least  for  the 
phospho-Ser-Pro  antibodies,  all  preliminary  indications  are  that  we  have  the  correct 
specificity.  However,  it  is  still  an  open  question  whether  phospho-Ser-Pro  will  be 
recognized  in  all  contexts,  or  only  some.  An  additional  concern  is  that  many  in  vivo  sites 
appear  to  be  phospho-Thr-Pro,  and  our  anti-phospho-Thr-Pro  antibodies  do  not  seem  to 
be  as  good  as  the  ones  directed  against  phospho-Ser-Pro.  We  hope  to  repeat  the 
immunization,  and  hope  that  more  mice  will  survive.  These  and  similar  antibodies  are 
potentially  extremely  important  reagents  for  the  characterization  of  CDK  substrates. 

The  immunization  strategy  we  used  (cycles  of  multiple  peptides  on  multiple 
carriers,  with  only  one  small  epitope  in  common)  seemed  to  be  very  effective.  Possibly 
this  immunization  strategy  will  be  useful  for  getting  highly  specific  antibodies  against 
other  small  epitopes.  For  instance,  one  could  imagine  systematically  getting  monoclonal 
antibodies  specifically  against  mutant  forms  of  growth-factor  receptors;  this  might  be 
useful  in  anti-cancer  therapy. 

Aim  3.  Identify  genes  encoding  substrates  visualized  on  2D  gels. 

This  was  completed;  the  work  has  been  included  in  the  narrative  for  Aim  1. 

Aim  4.  Look  for  homologous  human  G1  phase  substrates. 

Since  analogs  or  homologs  of  Sid,  Mcm3,  and  Mcm4  were  already  known  in 
human  cells,  it  was  not  necessary  for  us  to  identify  them.  Other  labs  extended  our  work 
on  Sicl  to  its  human  analog,  p27. 


Key  Research  Accomplishments: 

•  optimization  of  2D  gels  for  yeast  phosphate  labeled  proteins 

•  identification  of  Sicl  as  a  key  G1  phase  substrate  of  Cln-Cdc28  kinase 

•  characterization  of  the  role  of  phosphorylation  of  Sicl 

•  identification  of  Mcm3  and  Mcm4  as  other  probable  Cdc28  substrates 

•  generation  of  anti-phospho-Ser-Pro  antibodies 

•  development  of  a  novel  immunization  strategy 
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by  the  Cdk  Inhibitor  Sicl.  Science  272,  560-562. 
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Conclusions. 

We  have  tried  hard  with  the  2D  gel  approach  and  have  optimized  many  steps, 
completing  Aim  1 .  However,  we  find  that  without  some  enrichment  step  for  substrates  of 
the  desired  type,  the  2D  gel  approach  is  probably  not  practical  for  non-abundant 
substrates.  This  is  because  of  the  problem  of  “coverage”,  not  because  of  a  problem  of 
sensitivity.  Nevertheless,  the  2D  gel  approach  has  helped  us  find  one  relevant  substrate 
(Sicl)  and  probably  two  others  (Mcm3  and  Cdc54),  completing  Aim  3.  Work  on  these 
substrates  is  going  well,  and  it  is  likely  that  at  least  two  of  these  substrates  or  their 
homologs  (Sicl  and  Mcm4)  are  relevant  in  humans  as  well  as  in  yeast,  which  addresses 
Aim  4.  Anti-phospho-SP  and  perhaps  also  anti-phospho-TP  antibodies  have  been 
obtained  using  a  novel  immunization  strategy,  completing  Aim  2.  These  antibodies  are 
potentially  extremely  useful;  their  characterization  is  continuing.  These  may  allow  the 
2D  approach  to  be  used  successfully.  Finally,  the  immunization  strategy  we  used  may  be 
applicable  to  other  interesting,  small  epitopes. 
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Linkage  of  Replication  to  Start 
by  the  Cdk  Inhibitor  Sid 

B.  L.  Schneider,*  Q.-H.  Yang,*  A.  B.  Futchert 

In  Saccharomyces  cerews/ae,  three  G-,  cyclins  (CIns)  are  important  for  Start,  the  event 
committing  cells  to  division.  Sid ,  an  inhibitor  of  Clb-Cdc28  kinases,  became  phospho- 
rylated  at  Start,  and  this  phosphorylation  depended  on  the  activity  of  CIns.  Sid  was 
subsequently  lost,  which  depended  on  the  activity  of  CIns  and  the  ubiquitinTConjugating 
enzyme  Cdc34.  Inactivation  of  Sid  was  the  only  nonredundant  essential  function  of  CIns, 
because  a  sic1  deletion  rescued  the  in  viability  of  the  cln1  cln2  cln3  triple  mutant.  In  sic1 
mutants,  DNA  replication  became  uncoupled  from  budding.  Thus,  Sid  may  be  a  sub¬ 
strate  of  Cln-Cdc28  complexes,  and  phosphorylation  and  proteolysis  of  Sid  may  regulate 
commitment  to  replication  at  Start. 


Before  yeast  can  replicate  DNA,  they  must 
pass  Start,  which  requires  a  cyclin-depen- 
dent  protein  kinase  composed  of  a  catalytic 
subunit  (Cdc28)  and  one  of  three  Gj  cyc¬ 
lins  (Clnl,  -2,  or  -3)  (] ).  After  Start,  B-type 
cyclin-Cdc28  kinases  such  as  Clb5-Cdc28 
and  Clb6-Cdc28  must  be  activated  to  allow 
replication  (2).  Although  Clb5-  and  Clb6- 
Cdc28  complexes  are  present  in  phase, 
they  are  initially  inactive  because  of  inhi- 
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bition  by  the  Sicl  protein  (2,  3).  Activa¬ 
tion  of  Clb5-  and  Clb6-Cdc28  occurs  after 
Sicl  is  targeted  for  proteolysis  by  the  ubiq- 
uitin-conjugating  enzyme  Cdc34  (2).  Thus, 
a  cdc34  mutant  arrests  with  a  IN  DNA 
content  because  it  cannot  degrade  Sicl,  but 
nevertheless  buds,  and  duplicates  its  spindle 
pole  body. 

It  is  not  known  how  Start  triggers  Sicl 
inactivation  or  how  replication  is  tied  to 
other  Start-dependent  events  such  as  bud¬ 
ding  and  duplication  of  the  spindle  pole 
body.  Is  Start  a  single  event  that  affects 
multiple  pathways,  or  is  Start  a  collection  of 
events,  one  of  which  regulates  Sicl  prote¬ 
olysis  and  replication? 

We  asked  whether  Cln-Cdc28  complex¬ 
es  phosphorylate  Sicl,  thereby  targeting  it 
for  proteolysis.  Sicl  coprecipitates  with 
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Fig.  2.  Loss  of  Sid  depends  on  CLNs  and  on 
CDC34.  Abundance  and  phosphorylation  of  Sid 
were  analyzed  in  reciprocal  shift  experiments  (20). 

Strain  #31  (cln1  cln2  GAL-CLN3  cdc34)  (19)  was 
used.  (A)  Cells  were  grown  in  galactose  medium 
at  23°C  (lane  1),  shifted  to  glucose  at  23°C  for  3 
hours  to  synchronize  cells  at  Start  (lane  2),  then 
shifted  to  37°C  for  another  hour  to  inactivate 
Cdc34  (lane  3).  CIn  expression  was  then  restored 
by  shifting  back  to  galactose  medium,  but  cells 
were  held  at  37°C  (Cdc34“).  Samples  were  taken 
every  30  min  (lanes  4  to  9).  As  a  control,  Cin 
expression  and  Cdc34  function  were  both  re¬ 
stored  (lanes  1 0  to  1 5)  to  doubly  blocked  cells.  (B) 

Cells  were  grown  in  galactose  medium  at  23°C 
(lane  1 ),  shifted  to  37°C  for  3  hours  to  synchronize 
cells  at  the  cdc34  block  (lane  2),  then  shifted  to 
glucose  at  37°C  for  1  hour  to  shut  off  GAL-CLN3 
(lane  3).  Cdc34  function  was  restored  by  a  shift  to 
23°C,  but  cells  were  kept  in  glucose  medium 
(Cln“).  Samples  were  taken  every  30  min  (lanes  4 
to  9).  As  a  control,  Cdc34  function  and  Cin  ex¬ 
pression  were  both  restored  (lanes  1 0  to  1 5)  to  doubly  blocked  cells.  FACS  analysis  showed  that  the  cells 
in  lanes  4  to  9  (A  and  B)  failed  to  replicate  DNA,  whereas  the  cells  in  lanes  10  to  1 5  did  replicate  DNA. 
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Fig.  1.  Sid  is  a  phos- 
phoprotein  in  vivo.  Ex¬ 
tracts  were  made  as  de¬ 
scribed  (77),  and  Sid 
was  immunoprecipitated 
(14).  The  immunopre- 
cipitates  were  treated  or 
not  treated  with  phos¬ 
phatase  (78),  resolved 
by  SDS-PAGE  (75).  blot¬ 
ted  to  nitrocellulose,  and 
Sid  was  detected  (76). 

Lane  1 ,  asynchronous  cells:  lane  2,  asynchronous 
sici  cells:  lane  3,  strain  #31  (19)  arrested  at  the 
cdc34  block  at  37°C;  lane  4,  as  in  lane  3,  but 
treated  with  calf  intestinal  phosphatase  (CIP):  lane 
5,  as  in  lanes  3  and  4,  but  treated  with  CIP  and  the 
phosphatase  inhibitor  B-glycerolphosphate  (Inh.). 

Cdc28  (4),  has  one  of  the  highest  densities 
of  potential  Cdc28  phosphorylation  sites  of 
any  known  yeast  protein  (5),  and  can  he 
phosphorylated  on  many  sites  by  Cdc28  in 
vitro  (4,  6). 

Sid  is  a  phosphoprotein  in  vivo.  Reso¬ 
lution  of  Sid  by  SDS-polyacrylamide  gel 
electrophoresis  (PAGE)  followed  by  immu- 
nohlotting  showed  a  broad,  fuzzy  band  that 
may  contain  multiple  forms  of  Sid.  Phos¬ 
phatase  treatment  converted  this  fuzzy  band 
(more  phosphorylated  form)  to  a  band  of 
greater  mobility  (less  phosphorylated  form) 

(Fig.  1). 

To  study  the  relation  between  the  Gins, 
phosphorylation  and  proteolysis  of  Sid, 
and  DNA  synthesis,  we  constructed  a  clnl 
cln2  GAL-CLN3  cdc34-2  (temperature-sen¬ 
sitive)  strain  and  did  reciprocal  shift  exper¬ 
iments.  As  expected,  cells  shifted  from  the 
Cln“Cdc34"^  state  to  the  Cln^Cdc34~ 
state  arrested  with  a  Cdc34~  phenotype 
without  dividing.  Sid  accumulated  in  the 
less  phosphorylated  form  in  Cln“-arrcsted 
cells,  but  was  phosphorylated  to  a  greater 
extent  when  Cin  was  restored  (7)  (Fig.  2 A, 
compare  lanes  3  and  4).  However,  in  the 
absence  of  Cdc34  function  (Cln^Cdc34~), 
this  highly  phosphorylated  Sid  remained 
undegraded  (Fig.  2 A,  lanes  4  to  9).  In  con¬ 
trol  cells  arrested  in  the  Cln“Cdc34^  state, 
then  released  to  the  Cln‘*“Cdc34'^  state, 
Sid  became  more  phosphorylated  when 
Cin  was  restored,  and  then  disappeared, 
presumably  because  of  proteolysis  (7)  (Fig. 
2A,  lanes  10  to  15).  These  cells  then  reen¬ 
tered  a  normal  cell  cycle.  Thus,  in  vivo,  the 
Cln-Cdc28  complexes  are  needed  to  gener¬ 
ate  highly  phosphorylated  Sid,  which  is 
stable  in  the  absence,  but  not  in  the  pres¬ 
ence,  of  Cdc34  function. 

Cdc34  has  been  considered  to  act  down¬ 
stream  of  Clns  and  Cdc28.  Surprisingly,  how¬ 
ever,  cells  shifted  from  the  Cln'^Cdc34”  state 
to  the  Cln“Cdc34'^  state  did  not  enter  S 
phase  or  divide  and  in  all  respects  maintained 
a  Cdc34~  phenotype.  This  result  suggests  that 
the  Cdc34  function  cannot  be  completed  in 


the  absence  of  Cln-Cdc28  activity.  Highly 
phosphorylated  Sid  accumulated  in  the 
Cln^Cdc34“  cells  (7)  (Fig.  2B,  lane  2);  Sid 
then  became  less  phosphorylated,  but  not  de¬ 
graded,  after  the  shift  to  the  Cln“Cdc34’^ 
state  (7)  (Fig.  2B,  lanes  4  to  9).  This  result 
suggests  that  the  Cdc34^  phenotype  is  main¬ 
tained  in  the  Cln'Cdc34^  cells  because  the 
less  phosphoi^dated  form  of  Sid  cannot  be 
degraded  in  the  absence  of  Cin  activity. 
When  cells  were  shifted  from  Cln'^Cdc34~  to 
Cln^Cdc34'^,  the  more  phosphorylated  form 
of  Sid  that  had  accumulated  at  the  cdc34 
block  disappeared  (Fig.  2B,  lanes  10  to  15), 
and  the  cells  went  through  S  phase  and  reen¬ 
tered  a  normal  cycle.  These  experiments  show 
that  Sid  loss  requires  Cin  function  as  well  as 
Cdc34  function,  and  that  the  more  phospho¬ 
rylated  form  of  Sid  is  dependent  on  Cin 
activity  and  correlated  with  Sid  loss.  Because 
cells  arrest  before  S  phase  regardless  of  the 
phosphorylation  state  of  Sid,  both  forms 
must  inhibit  Clb-Cdc28  complexes. 

These  results  arc  consistent  with  a  model 
wherein  Cln-Cdc28  complexes  phospho- 
rylatc  Sid,  and  this  phosphorylation  targets 
Sid  for  degradation  by  the  Cdc34  pathway. 
However,  the  experiments  are  correlative, 
and  other  mechanisms  are  also  possible.  For 
example,  Cln-Cdc28  complexes  may  serve 
to  activate  Cdc34  itself,  and  the  phospho¬ 
rylation  of  Sid  may  be  a  correlated  but 
irrelevant  event. 

If  a  major  function  of  Clns  is  to  promote 
proteolysis  of  Sid,  then  Clns  should  be  less 
important  in  a  sicl  mutant.  Indeed,  a  sic! 
mutation  suppressed  the  lethality  of  a  clnl 
cln2  cln3  triple  null  mutation  (Fig.  3B,  sec¬ 
tors  1,  3,  and  4).  Thus,  the  only  nonredun- 
dant  essential  function  of  the  Clns  is  to 
inactivate  Sicl.  The  clnl  cln2  cln3  triple 
mutation  is  also  suppressed  by  a  mutation 
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Fig.  3.  A  s/c7  deletion  suppresses  lethality  of  cln1 
cln2  ctn3.  (A)  YEP  +  1  %  raffinose  -I-  1  %  galac¬ 
tose.  (B)  YEP  -f-  2%  glucose.  Plates  were  incubat¬ 
ed  at  30° C  for  3  days.  Strains  were  as  follows:  1 , 
BS147  ipGAL-CLN3  Acins  Asicl)]  2,  BS100 
(GAL-CLN1  Acins):  3,  BS178  (GAL~CLN1  Acins 
Asid):  and  4,  BS152  (Acins  Asici)  (19). 


called  BYCi  (8),  and  it  now  appears  that 
BYCl  is  allelic  to  sicl  (9).  This  suppression 
by  BYCI  occurs  even  if  clh2,  clh5,  or  pell  is 
also  deleted  (8).  Clnsl,  -2,  and  -3  have 
other  important  functions  that  are  compro¬ 
mised  in  the  clnl  cln2  cln3  sicl  quadruple 
mutant:  Plating  efficiency  is  poor,  budding 
and  cell  morphology  are  highly  abnormal, 
and  the  cells  are  generally  sick.  Presumably, 
budding  is  now  mediated  by  combinations 
of  other  cyclins  such  as  Pell,  Pcl2,  Clb5, 
and  Clb6  (10). 

If  Sicl  is  an  important  and  specific  in¬ 
hibitor  of  replication,  then  a  sicl  mutation 
might  uncouple  DNA  replication  from  oth¬ 
er  Start  events,  such  as  budding.  To  test  this 
hypothesis,  we  obtained  small  unbudded 
cells  from  an  exponential  culture  of  sicl 
cells  and  examined  the  cells  for  DNA  con¬ 
tent  by  fluorescent-activated  cell  sorting 
(FACS).  At  least  20%  of  the  unbudded 
cells  were  already  2N,  whereas  there  were 
essentially  no  2N  cells  in  the  equivalent 
fraction  from  a  wild-type  culture.  After  re¬ 
inoculation  into  fresh  medium,  the  sicl  cells 
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Fig.  4.  A  sic1  deletion  uncouples  S 
phase  from  budding.  (A)  Small  un¬ 
budded  cells  of  strain  W303a  (19)  (□) 
or  its  isogenic  sic1::URA3  derivative 
BS193  (■)  were  obtained  by  elutria- 
tion  (21).  Cells  were  reinoculated  in 
fresh,  warm  medium,  and  samples 
were  taken  every  15  min  and  ana¬ 
lyzed  for  budding,  cell  volume,  and 
DNA  content  (FACS)  (22).  (B)  Strain 
BS147  {pGAL-CLN3  ^clns  Asici) 
(19)  was  grown  in  sucrose  plus  galac¬ 
tose.  Cells  were  washed  and  resus¬ 
pended  in  medium  containing  su¬ 
crose  but  no  galactose  to  turn  off 
GAL-CLN3.  After  1  hour,  small  un¬ 
budded  cells  were  collected  by  elu- 
triation  (21).  Half  the  sample  was  re¬ 
inoculated  into  YNB  medium  with  2% 
sucrose  (GAL-CLN3  off)  (O),  and  the 
other  half  was  reinoculated  into  YNB 
medium  with  1  %  sucrose  and  1  %  ga¬ 
lactose  (GAL-CLN3  on)  (•).  Samples 
were  taken  every  30  min  and  ana¬ 
lyzed  as  in  (A).  W303a  cells  (19) 
grown  in  YNB  +  2%  sucrose  were 
elutriated  and  monitored  after  reinoc¬ 
ulation  {□), 
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replicated  DNA  much  earlier  than  the 
wild-type  cells,  but  budded  at  about  the 
same  time  (Fig.  4A).  lln  other,  similar  ex¬ 
periments,  the  sic  I  mutation  did  advance 
budding  slightly,  although  never  as  much  as 
the  advance  in  S  phase  (2,  II).  The  early 
activation  of  Clb5  that  occurs  in  sicl  cells 
may  advance  budding.] 

In  a  second  experiment,  clnl  cln2  GAL- 
CLN3  sicl  cells  were  grown  with  GAL- 
CLN3  on,  and  then  GAL-CLN3  was  turned 
off  for  1  hour.  Small  unbudded  cells  were 
obtained  by  elutriation.  Fifty  to  80%  of 
these  cells  had  a  DNA  content  greater  than 
IN,  despite  their  lack  of  Cln.  (The  large 
fraction  of  2N  cells  probably  resulted  from 
overexpression  of  CLB5  induced  by  GAL- 
CLN3.)  When  the  cells  were  released  into 
fresh  medium,  efficient  budding  was  still 
dependent  on  reexpression  of  Cln3,  where¬ 
as  S  phase  was  not  (Fig.  4B).  Thus,  in  sicl 
mutants,  replication  and  budding  are  un¬ 
coupled;  they  occur  at  different  times,  and 
budding  is  much  more  dependent  on  Cln 
than  is  replication. 

Although  phosphorylation  and  loss  of 
Sicl  are  dependent  on  both  Cln  and  Cdc34 
function,  we  have  not  shown  that  Sicl  is  a 
direct  substrate  of  the  Cln-Cdc28  kinase  in 
vivo,  nor  that  Sicl  proteolysis  is  ubiquitin- 
mediated.  However,  these  are  both  strong 
possibilities.  Phosphorylation  converts  at 
least  one  other  protein  into  a  substrate  for 
Cdc34-mediated  proteolysis  (12).  Whatev¬ 
er  the  precise  mechanism  by  which  Clns 
and  Cdc34  cause  the  loss  of  Sicl,  our  ge¬ 
netic  experiments  show  that  this  loss  is 
largely  responsible  for  the  normal  depen¬ 
dence  of  DNA  replication  on  Start. 


An  analogous  system  may  be  used  by 
mammalian  cells.  Cyclin  D-Cdk4  complex¬ 
es  promote  S  phase  by  inhibiting  function 
of  the  retinoblastoma  protein.  In  cells  lack¬ 
ing  retinoblastoma,  the  cyclin  D-Cdk4  ac¬ 
tivity  is  no  longer  required  (13). 

The  identification  of  Sicl  as  a  target  of 
Clns  suggests  that  Start  consists  of  several 
component  events.  The  Start  event  con¬ 
trolling  S  phase  is  probably  phosphorylation 
of  Sicl;  phosphorylation  of  other  substrates 
may  control  budding  and  duplication  of  the 
spindle  pole  body,  and  together  these  phos¬ 
phorylations  constitute  Start. 
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Abstract 


Yeast  proteins  have  been  examined  by  two-dimensional  gel 
electrophoresis,  and  quantitative  information  has  been  gathered 
from  about  1400  spots.  There  is  an  enormous  range  of  protein 
abundance.  For  identified  spots,  there  is  a  good  correlation  between 
protein  abundance,  mRNA  abundance,  and  codon  bias.  For  each 
molecule  of  well-translated  mRNA,  there  are  about  4000  molecules 
of  protein.  The  relative  abundance  of  proteins  has  been  measured  in 
glucose  and  ethanol  media.  Protein  turnover  has  been  examined,  and 
is  insignificant  for  abundant  proteins.  Some  phosphoproteins  have 
been  identified.  The  behavior  of  proteins  in  differential 
centrifugation  experiments  has  been  examined.  Such  experiments 
with  2D  gels  can  give  a  global  view  of  the  yeast  proteome. 
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Introduction 


The  sequence  of  the  yeast  genome  has  been  determined  (9). 
More  recently,  the  number  of  mRNA  molecules  for  each  expressed 
gene  has  been  measured  (27,  30).  The  next  logical  level  of  analysis  is 
that  of  the  expressed  set  of  proteins.  We  have  begun  to  analyze  the 
yeast  proteome  using  two-dimensional  gels. 

Two-dimensional  (2D)  gel  electrophoresis  separates  proteins 
according  to  isoelectric  point  in  one  dimension  and  molecular  weight 
in  the  other  dimension  (21),  allowing  resolution  of  thousands  of 
proteins  on  a  single  gel.  Although  modern  imaging  and  computing 
techniques  can  extract  quantitative  data  for  each  of  the  spots  in  a  2D 
gel,  there  are  only  a  few  cases  in  which  quantitative  data  been 
gathered  from  2D  gels.  2D  gel  electrophoresis  is  almost  unique  in  its 
ability  to  examine  biological  responses  over  thousands  of  proteins 
simultaneously.  It  should  therefore  allow  us  a  relatively 
comprehensive  view  of  cellular  metabolism. 

We  and  others  have  worked  towards  assembling  a  yeast 
protein  database  consisting  of  a  collection  of  identified  spots  in  2D 
gels  and  of  data  on  each  of  these  spots  under  various  conditions  (2,  7, 
8,  10,  23,  25).  These  data  could  then  be  used  in  analyzing  a  protein 
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or  a  metabolic  process.  S.  cerevisiae  is  a  good  organism  for  this 
approach  since  it  has  a  well  understood  physiology,  a  large  number 
of  mutants,  and  its  genome  has  been  sequenced.  Given  the  sequence, 
and  the  relative  lack  of  introns  in  S.  cerevisiae,  it  is  easy  to  predict 
the  sequence  of  the  primary  protein  product  of  most  genes.  This  aids 
tremendously  in  identifying  these  proteins  on  2D  gels. 

There  are  three  pillars  on  which  such  a  database  rests.  The 
first  is  visualization  of  many  protein  spots  simultaneously;  the 
second  is  quantification  of  the  protein  in  each  spot;  and  the  third  is 
identification  of  the  gene  product  for  each  spot.  Our  first  efforts  at 
visualization  and  identification  for  S.  cerevisiae  have  been  described 
(7,  8).  Here  we  describe  quantitative  data  for  these  proteins  under  a 
variety  of  experimental  conditions. 

Results 

Visualization  of  1400  spots  on  Three  Gel  Systems. 

Yeast  proteins  have  isoelectric  points  ranging  from  3.1  to  12.8, 
and  masses  ranging  from  less  than  10  kDa  to  470  kDa.  It  is  difficult 
to  examine  all  proteins  on  a  single  kind  of  gel,  because  a  gel  with  the 
needed  range  in  pi  and  mass  would  give  poor  resolution  of  the 
thousands  of  spots  in  the  central  region  of  the  gel.  Therefore,  we 
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have  used  three  gel  systems:  (1)  pH  “4-8”  with  10%  polyacrylamide; 
(2)  pH  “3-10”  with  10%  polyacrylamide;  and  (3)  non-equilibrium  gels 
with  15%  polyacrylamide  (7,  8).  Each  gel  system  allows  good 
resolution  of  a  subset  of  yeast  proteins 

Fig.  1  shows  a  pH  4-8,  10%  polyacrylamide  gel.  The  pH  at  the 
basic  end  of  the  isoelectric  focusing  gel  cannot  be  maintained 
throughout  focusing,  so  the  proteins  resolved  on  such  gels  have 
isoelectric  points  between  pH  4  and  pH  6.7.  For  these  pH  4-8  gels, 
we  see  600  to  900  spots  on  the  best  gels  using  multiple  exposures. 

The  pH  “3-10”  gels  (not  shown)  extend  the  pi  range  somewhat 
beyond  pH  7.5,  allowing  detection  of  several  hundred  additional 
spots.  Finally,  we  use  non-equilibrium  gels  with  15%  acrylamide  in 
the  second  dimension.  These  allow  visualization  of  about  100  very 
basic  proteins,  and  about  170  small  proteins  (less  than  20  kDa).  In 
total,  using  all  three  gel  systems,  about  1400  spots  can  be  seen. 
These  represent  about  1200  different  proteins,  which  is  about  one- 
quarter  to  one-third  of  the  proteins  expressed  under  these  conditions 
(27,  30).  Here,  we  focus  on  the  proteins  seen  on  the  pH  “4-8”  gels. 

Although  nearly  all  expressed  proteins  are  present  on  these 
gels,  the  number  seen  is  limited  by  a  problem  we  call  “coverage”. 
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Since  there  are  thousands  of  proteins  on  each  gel,  many  proteins  co¬ 
migrate,  or  nearly  co-migrate.  When  two  proteins  are  resolved,  but 
are  close  together,  and  one  protein  spot  is  much  more  intense  than 
the  other,  a  problem  arises  in  visualizing  the  weaker  spot— at  long 
exposures  when  the  weak  signal  is  strong  enough  for  detection,  the 
signal  from  the  strong  spot  spreads,  and  covers  the  signal  from  the 
weaker  spot.  Thus,  weak  spots  can  be  seen  only  when  they  are  well- 
separated  from  strong  spots. 

For  a  given  gel,  the  number  of  detectable  spots  initially  rises 
with  exposure  time.  However,  beyond  an  optimal  exposure,  the 
number  of  distinguishable  spots  begins  to  decrease,  because  signals 
from  strong  spots  cover  signals  from  nearby  weak  spots.  At  long 
exposures  the  whole  autoradiogram  turns  black.  Thus,  there  is  an 
optimum  exposure  yielding  the  maximum  number  of  spots,  and  at 
this  exposure  the  weakest  spots  are  not  seen. 

Largely  because  of  the  problem  of  coverage,  the  proteins  seen 
are  strongly  biased  towards  abundant  proteins.  All  identified 
proteins  have  a  codon  adaptation  index  (CAl)  of  0.18  or  more,  and 
we  have  identified  no  transcription  factors  or  protein  kinases,  which 
are  non-abundant  proteins.  Thus,  this  technology  is  useful  for 
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examining  protein  synthesis,  amino  acid  metabolism,  and  glycolysis, 
but  not  for  examining  transcription,  DNA  replication,  or  the  cell  cycle. 
Spot  Identification. 

The  identification  of  various  spots  has  been  described  (7,  8). 
At  present,  169  different  spots  representing  148  proteins  have  been 
identified.  Many  of  these  spots  have  been  independently  identified 
(2,  10,  23,  25).  The  main  methods  used  in  spot  identification  have 
been  analysis  of  amino  acid  composition,  gene  over-expression, 
peptide  sequencing,  and  mass  spectrometry. 

Pulse-chase  experiments  and  protein  turnover. 

Pulse  chase  experiments  were  done  to  measure  protein  half- 
lives  (Materials  and  Methods).  Cells  were  labeled  with 
methionine  for  10  min.,  and  then  an  excess  of  unlabeled  methionine 
was  added.  Samples  were  taken  at  0,  10,  20,  30,  60  and  90  minutes 
after  the  beginning  of  the  chase.  Equal  amounts  of  were  loaded 
from  each  sample.  2D  gels  were  run  and  spots  were  quantitated. 
Surprisingly,  almost  every  spot  was  nearly  constant  in  amount  of 
radioactivity  over  the  entire  time  course  (not  shown).  A  few  spots 
shifted  from  one  position  to  another  because  of  post-translational 
modifications  (e.g.,  phosphorylation  of  RpaO  and  Efbl).  Thus,  the 
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proteins  being  visualized  are  all  or  nearly  all  very  stable  proteins, 
with  half-lives  of  more  than  90  min.  Gygi  et  al.  (10)  have  come  to  a 
similar  conclusion  by  using  the  N-end  rule  to  predict  protein  half- 
lives.  This  result  does  not  imply  that  all  yeast  proteins  are  stable. 
The  proteins  being  visualized  are  abundant  proteins,  and  this  is 
partly  because  they  are  stable  proteins. 

Protein  Quantitation. 

Because  all  the  proteins  seen  had  effectively  the  same  half  life, 
the  abundance  of  each  protein  was  directly  proportional  to  the 
amount  of  radioactivity  incorporated  during  labeling.  Thus,  after 
taking  into  account  the  total  number  of  protein  molecules  per  cell, 
the  average  content  of  methionine  and  cysteine,  and  the  methionine 
and  cysteine  content  of  each  identified  protein,  it  was  possible  to 
calculate  the  abundance  of  each  identified  protein  (Table  1;  Table  2) 
(Experimental  Procedures).  About  1000  unidentified  proteins  were 
also  quantified  assuming  an  average  content  of  met  and  cys. 

Many  proteins  give  multiple  spots  (7,  8).  The  contribution 
from  each  spot  was  summed  to  give  the  total  protein  amount. 
However,  many  proteins  probably  have  minor  spots  we  are  not 
aware  of,  causing  the  amount  of  protein  to  be  underestimated. 
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When  the  proteins  on  a  pH  4-8  gel  were  ordered  by  abundance, 
the  most  abundant  protein  had  8904  ppm,  the  10th  most  abundant 
protein  had  2842  ppm,  the  100th  had  314  ppm,  the  500th  57  ppm, 
and  the  1000th  protein  (visualized  at  greater  than  optimum 
exposure)  had  23  ppm.  Thus  there  is  more  than  a  300-fold  range  in 
abundance  amongst  the  visualized  proteins.  The  most  abundant  1 0 
proteins  account  for  about  25%  of  the  total  protein  on  the  pH  4-8  gel, 
the  most  abundant  60  proteins  account  for  50%,  and  the  most 
abundant  500  proteins  account  for  80%.  Since  it  seems  likely  that 
the  pH  4-8  gels  give  a  representative  sampling  of  all  the  proteins,  we 
estimate  that  half  of  total  cellular  protein  is  accounted  for  by  less 
than  100  different  gene  products.  These  are  principally  glycolytic 
enzymes  and  proteins  involved  in  protein  synthesis. 

Correlation  of  protein  abundance  with  mRNA  abundance. 

Estimates  of  mRNA  abundance  for  each  gene  have  been  made 
by  Serial  Analysis  of  Gene  Expression  (SAGE)  (27),  and  by 
hybridization  of  cRNA  to  oligonucleotide  arrays  (30).  These  two 
methods  give  broadly  similar  results,  yet  each  method  has  strengths 
and  weaknesses  (Materials  and  Methods).  Table  1  lists  the  number 
of  molecules  of  mRNA  per  cell  for  each  gene  studied.  One  list 
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(“mRNA”)  uses  data  from  SAGE  analysis  alone  (27);  a  second  list 
incorporates  data  from  both  SAGE  and  hybridization  (30)  (“adjusted 
mRNA”,  Table  1;  Materials  and  Methods).  We  correlated  protein 
abundance  with  mRNA  abundance  (Fig.  2).  For  “adjusted  mRNA” 
versus  protein,  the  Spearman  rank  correlation  was  0.74  (p<  0.0001), 
and  the  Pearson  correlation  on  log  transformed  data  (Materials  and 
Methods)  was  0.76  (p  <0.00001).  We  obtained  similar  correlations 
for  “mRNA”  versus  protein,  and  also  using  other  data  transformations 
(Materials  and  Methods).  Thus,  several  statistical  methods  show  a 
strong  and  significant  correlation  between  mRNA  abundance  and 
protein  abundance.  Of  course,  the  correlation  is  far  from  perfect;  for 
mRNAs  of  a  given  abundance,  there  is  at  least  a  10-fold  range  of 
protein  abundance  (Fig.  2).  Some  of  this  scatter  is  probably  due  to 
post-transcriptional  regulation,  and  some  is  due  to  errors  in  the 
mRNA  or  protein  data.  For  example,  the  protein  Yef3  runs  poorly  on 
our  gels,  giving  multiple  smeared  spots.  Its  abundance  has  probably 
been  underestimated,  partly  explaining  the  low  protein/mRNA  ratio 
of  Yef3.  It  is  the  most  extreme  outlier  in  Fig.  2. 

These  data  on  mRNA  (27,  30)  and  protein  abundance  (Table  1), 
suggest  that  for  each  mRNA  molecule,  there  are  on  average  4000 
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molecules  of  the  cognate  protein.  For  instance,  for  Actl  (actin)  there 
are  about  54  molecules  of  mRNA  per  cell,  and  about  205,000 
molecules  of  protein.  Assuming  an  mRNA  half-life  of  30  min.  (12) 
and  a  doubling  time  of  120  min.,  this  suggests  that  an  individual 
molecule  of  mRNA  might  be  translated  roughly  1000  times.  These 
calculations  are  limited  to  mRNAs  for  abundant  proteins,  which  are 
likely  to  be  the  mRNAs  that  are  translated  best. 

A  full  complement  of  cell  protein  is  synthesized  in  about  120 
min.  under  these  conditions.  Thus,  4000  molecules  of  protein  per 
molecule  of  mRNA  implies  that  translation  initiates  on  an  mRNA 
about  once  every  2  sec.  This  is  a  remarkably  high  rate;  it  implies 
that  if  an  average  mRNA  bears  10  ribosomes  engaged  in  translation, 
then  each  ribosome  completes  translation  in  20  sec.;  if  an  average 
protein  has  450  residues,  then  this  implies  translation  of  over  2  0 
amino  acids  per  second,  a  rate  considerably  higher  than  estimated  in 
mammalians  (3  to  8  amino  acids  per  second)  (18).  These  estimates 
depend  on  the  amount  of  mRNA  per  cell  (11,  27). 

The  large  number  of  protein  molecules  that  can  be  made  from 
a  single  mRNA  raises  the  issue  of  how  abundance  is  controlled  for 
less  abundant  proteins.  Many  ,non-abundant  proteins  may  be 
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unstable,  and  this  would  reduce  the  protein/mRNA  ratio.  I  n 
addition,  many  non-abundant  proteins  may  be  translated  at  sub- 
optimal  rates.  We  have  found  that  mRNAs  for  non-abundant 
proteins  usually  have  sub-optimal  contexts  for  translational 
initiation.  For  example,  there  are  over  600  yeast  genes  which 
probably  have  short  open  reading  frames  in  the  mRNA  upstream  of 
the  main  open  reading  frame  (Latter  and  Futcher,  unpublished). 
These  may  be  devices  for  reducing  the  amount  of  protein  made  from 
a  molecule  of  mRNA. 

Correlation  of  Codon  Bias  with  protein  abundance. 

The  mRNAs  for  highly-expressed  proteins  preferentially  use 
some  codons  rather  than  others  specifying  the  same  amino  acid  (14). 
This  preference  is  called  codon  bias.  The  codons  preferred  are  those 
for  which  the  tRNAs  are  present  in  the  greatest  amounts.  Use  of 
these  codons  may  make  translation  faster  or  more  efficient  and  may 
decrease  misincorporation.  These  effects  are  most  important  for  the 
cell  for  abundant  proteins,  and  so  codon  bias  is  most  extreme  for 
abundant  proteins.  The  effect  can  be  dramatic--highly  biased 
mRNAs  may  use  only  25  of  the  61  codons. 
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We  asked  whether  the  correlation  of  codon  bias  with 


abundance  continues  for  medium  abundance  proteins.  There  are 
various  mathematical  expressions  quantifying  codon  bias;  here,  we 
have  used  the  “codon  adaptation  index”  (CAI)  (24)  (Materials  and 
Methods)  because  it  gives  a  result  between  0  and  1.  The  Spearman 
rank  correlation  for  codon  adaptation  index  versus  protein 
abundance  is  0.80  (p  <  0.0001),  similar  to  the  mRNAiprotein 
correlation,  confirming  a  strong  correlation  between  codon 
adaptation  index  and  protein  abundance  (Fig.  3).  The  relationship 
between  CAI  and  protein  abundance  is  log-linear  from  about 
1,000,000  molecules  of  protein  per  cell  to  about  10,000  molecules 
per  cell.  We  have  no  data  for  rarer  proteins. 

It  is  not  clear  whether  CAI  reflects  maximum  or  average  levels 
of  protein  expression.  The  proteins  used  for  the  CATprotein 
correlation  included  some  proteins  which  were  not  expressed  at 
maximum  levels  under  the  condition  of  the  experiment  (e.g.,  Hsc82, 
Hspl04,  Ssal,  Adel,  Arg4,  His4,  and  others).  When  these  proteins 
were  removed  from  consideration,  and  the  correlation  between  CAI 
and  the  remaining  (presumably  constitutive)  proteins  was 


13 


recalculated,  the  Spearman  correlation  coefficient  was  essentially 
unchanged  (not  shown). 

The  equation  describing  the  graph  in  Fig.  3  is:  log(protein 
molecules/cell)  =  (2.3  x  CAI)  +3.7.  Thus,  under  certain  conditions  (a 
CAI  of  0.3  or  greater;  a  constitutively  expressed  gene)  a  very  rough 
estimate  of  protein  abundance  can  be  made  by  raising  10  to  the 
power  of  [(2.3  x  CAI)  +  3.7]. 

The  distribution  of  CAI  over  the  genome  (Fig.  4)  consists  of  a 
lower,  bell-shaped  distribution,  possibly  indicating  a  region  where 
there  is  no  selection  for  codon  bias;  and  an  upper,  flat  distribution, 
starting  at  a  CAI  of  about  0.3,  possibly  indicating  a  region  where 
there  is  selection  for  codon  bias.  Almost  all  of  the  proteins  whose 
abundance  we  have  measured  are  in  the  upper,  flat  portion  of  the 
distribution.  In  the  lower,  bell-shaped  region,  we  do  not  know 
whether  there  is  a  correlation  between  CAI  and  protein  abundance. 

Changes  in  protein  abundance  in  glucose  and  ethanol. 

A  comparison  of  cells  grown  in  glucose  (Fig.  lA)  with  cells 
grown  in  ethanol  (Fig.  IB)  is  shown  in  Table  1.  As  is  well  known, 
some  proteins  are  induced  tremendously  during  growth  on  ethanol. 
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Two  striking  examples  are  the  peroxisomal  enzymes  Icll  (isocitrate 
lyase)  and  Cit2  (citrate  synthase),  which  are  induced  in  ethanol  by 
more  than  100-fold,  and  12-fold,  respectively  (Fig.  1,  Table  1).  These 
enzymes  are  key  components  of  the  glyoxylate  shunt,  which  diverts 
some  acetyl-CoA  from  the  tricarboxylic  acid  cycle  to  gluconeogenesis. 
S.  cerevisiae  requires  large  amounts  of  carbohydrate  for  its  cell  wall, 
and  in  ethanol  medium,  this  carbohydrate  comes  from 
gluconeogenesis  which  depends  on  the  glyoxylate  shunt  and  on  the 
glycolytic  pathway  running  in  reverse.  The  need  for  gluconeogenesis 
also  explains  why  glycolytic  enzymes  are  abundant  even  in  ethanol 
medium.  Thus,  2D  gel  analysis  shows  the  prominence  of  the 
glycolytic  and  glyoxylate  shunt  enzymes  in  cells  grown  on  ethanol, 
emphasizing  that  gluconeogenesis,  presumably  largely  for  production 
of  the  cell  wall,  is  a  major  metabolic  activity  under  these  conditions. 

During  gluconeogenesis,  substrate/product  relationships  are 
reversed  for  the  glycolytic  enzymes.  One  might  expect  that  not  all 
glycolytic  enzymes  would  be  well-adapted  to  the  reverse  reaction. 
Indeed,  2D  gels  show  that  in  ethanol,  Adh2  (alcohol  dehydrogenase 
2)  is  strongly  induced  (16),  while  its  isozyme  Adhl  is  not  greatly 
affected.  Adhl  and  Adh2  each  interconvert  acetaldehyde  and 
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ethanol.  Adhl  has  a  relatively  high  for  ethanol  (17  mM),  while 
Adh2  has  a  lower  (0.8  mM)  (5).  Thus  it  is  thought  that  Adhl  is 
specialized  for  glycolysis  (acetaldehyde  to  ethanol),  while  Adh2  is 
specialized  for  respiration  (ethanol  to  acetaldehyde)  (5,  29). 

Similarly,  Enol  (enolase  1)  is  induced  in  ethanol,  while  its  isozyme 
Eno2  (enolase  2)  decreases  in  abundance  (Table  1)  (4,  19).  Enol  is 
inhibited  by  2-phosphoglycerate  (the  glycolytic  substrate)  while 
Eno2  is  inhibited  by  phosphoenolpyruvate  (the  gluconeogenic 
substrate)  (4).  Perhaps  Enol  has  a  lower  for 

phosphoenolpyruvate  than  does  Eno2,  though  to  our  knowledge  this 
has  not  been  tested.  Thus,  the  2D  gels  distinguish  isozymes 
specialized  for  growth  on  glucose  (Adhl  and  Eno2)  from  isozymes 
specialized  for  ethanol  (Adh2  and  Enol). 

Many  heat  shock  proteins  were  about  2-fold  more  abundant  in 
ethanol  medium  than  in  glucose  (e.g.  Hsp60,  Hsp82,  Hspl04,  and 
Kar2).  This  is  consistent  with  the  increased  heat  resistance  of  cells 
grown  in  ethanol  (3). 

Enzymes  involved  in  protein  synthesis  (Eftl,  RpaO,  Tifl)  were 
about  twice  as  abundant  in  glucose  as  in  ethanol  medium.  This  may 
reflect  the  higher  growth  rate  of  the  cells  in  glucose. 
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Phosphorylation  of  Proteins. 

To  examine  protein  phosphorylation,  we  labeled  cells  with  ^^P 
and  ran  2D  gels  to  examine  phosphoproteins.  About  300  distinct 
spots  could  be  seen  on  pH  4-8  gels  (Fig.  5B),  probably  representing 
150  to  200  proteins.  We  then  aligned  autoradiograms  of  three  gels, 
each  with  a  different  kind  of  labeled  protein  only.  Fig.  5B;  ^^P 

plus  Fig.  5 A;  and  only,  not  shown,  but  see  Fig.  1  for  example). 
In  this  way,  we  made  provisional  identification  of  some  of  the  ^^P 
labeled  spots  as  particular  ^^S-labeled  spots.  All  such  identifications 
are  somewhat  uncertain,  since  precise  alignments  are  difficult,  and  of 
course  multiple  spots  may  exactly  co-migrate.  Nevertheless,  we 
believe  that  most  of  the  provisional  identifications  are  probably 
correct.  Among  the  major  ^^P-labeled  proteins  are  the  hexokinases 
Hxkl  and  Hxk2,  the  acidic  ribosome  associated  protein  RpaO,  the 
translation  factors  Yef3  and  Efbl,  and  probably  Hsp70  heat-shock 
proteins  of  the  Ssa  and  Ssb  families.  RpaO  and  Efbl  are 
quantitatively  mono-phosphorylated. 

Many  yeast  proteins  resolve  into  multiple  spots  on  these  2D 
gels  (7).  Yef3  has  5  or  more  spots,  at  least  four  of  which  co-migrate 
with  ^^P.  Tpil  has  a  major  spot  showing  no  ^^P  labeling,  and  a  minor. 
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more  acidic  spot  which  overlaps  with  some  label.  Tifl  has  at 
least  seven  spots  (7).  Two  of  these  overlap  with  some  label,  but 
five  do  not  (Fig.  5).  Eftl  has  at  least  three  spots  (7),  and  none  of 
these  overlap  with  ^^P,  although  there  are  three  nearby,  unidentified 
^^P-labeled  spots  (a,  c,  and  d  in  Fig.  5).  Spots  that  seem  to  be  extra 
forms  of  Met6,  Pdcl,  Eno2,  and  Fbal  can  be  seen  in  Fig.  6A,  but  there 
is  little  ^^P  at  these  positions  in  Fig.  5.  Thus,  phosphorylation 
explains  some  but  not  all  of  the  different  protein  isoforms  seen. 

The  cell  cycle  is  regulated  in  part  by  phosphorylation.  We 
compared  ^^P-labeled  proteins  from  cells  synchronized  in  G1  with  a- 
factor,  in  cells  synchronized  in  G1  by  depletion  of  G1  cyclins,  and  in 
cells  synchronized  in  M-phase  with  nocodazole.  Only  very  minor 
differences  were  seen,  and  these  were  difficult  to  reproduce.  The 
cell  cycle  proteins  regulated  by  phosphorylation  may  not  be 
abundant  enough  for  this  technique  to  be  applied  easily. 

Centrifugal  Fractionation. 

We  fractionated  ^“'S-labeled  extracts  by  centrifugation 
(Materials  and  Methods).  Fig.  6A  shows  the  proteins  in  the 
supernatant  of  a  high  speed  (100,000  g,  30  min.)  centrifugation, 
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while  Fig.  6B  shows  the  proteins  in  the  pellet  of  a  low  speed  (16,000 
g,  10  min.)  centrifugation.  Many  proteins  are  tremendously  enriched 
in  one  fraction  or  the  other,  while  others  are  present  in  both.  Most 
glycolytic  enzymes  are  enriched  in  the  supernatant  fraction  (e.g., 
Tdh2,  Tdh3,  Eno2,  Pdcl,  Adhl,  Fbal).  The  only  exception  is  Pfkl  (not 
indicated),  which  is  found  in  both  pellet  and  supernatant  fractions. 
Many  proteins  involved  in  protein  synthesis  (Eftl,  Yef3,  Prtl,  Tifl, 
RpaO)  are  in  the  pellet,  possibly  because  of  the  association  of 
ribosomes  with  the  endoplasmic  reticulum.  However,  Efbl  is  in  the 
supernatant,  as  is  a  substantial  portion  of  the  Eftl.  Perhaps 
surprisingly,  several  mitochondrial  proteins  (Atp2  (not  shown),  II v5) 
are  largely  in  the  supernatant.  Perhaps  glass  bead  breakage  of  cells 
releases  mitochondrial  proteins.  The  nuclear  protein  Gspl  is  in  the 
pellet  fraction.  The  enrichment  produced  by  centrifugation  makes  it 
possible  to  see  minor  spots  which  are  otherwise  poorly  resolved  from 
surrounding  proteins.  Fig.  6B  shows  that  the  previously  identified 
Tifl  spot  is  surrounded  by  as  many  as  6  other  spots  that  co¬ 
fractionate.  We  observed  6  identical  or  very  similar  additional  spots 
when  we  over-expressed  Tifl  from  a  high  copy  number  plasmid  (not 
shown).  Signal  overlaps  only  one  or  two  of  these  spots  in  ^^P- 
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labeling  experiments  (Fig.  5),  so  the  different  forms  are  not  mainly 
due  to  different  phosphorylation  states. 

Discussion 

Our  experience  with  developing  a  2D  gel  protein  database  for  S. 
cerevisiae  is  summarized  here.  With  current  technology,  we  can  see 
the  most  abundant  1200  proteins,  which  is  about  one-third  to  one- 
quarter  of  the  proteins  expressed.  The  remaining  proteins  will  be 
difficult  to  see  and  study  with  the  methods  we  have  used,  not 
because  of  a  lack  of  sensitivity,  but  because  weak  spots  are  covered 
by  nearby  strong  spots. 

Of  the  1200  proteins  seen,  we  have  identified  148,  with  a  bias 
towards  the  most  abundant  proteins.  Steady  application  of  the 
methods  already  used  would  allow  identification  of  most  of  the 
remaining  proteins.  Gene  over-expression  will  be  particularly  useful, 
since  it  is  not  affected  by  the  lower  abundance  of  the  remaining 
visible  proteins. 

2D  gels  of  the  kind  we  have  used  are  not  suitable  for 
visualization  of  rare  proteins.  However,  it  will  be  possible  to  study 
on  a  global  basis  metabolic  processes  involving  relatively  abundant 
proteins,  such  as  protein  synthesis,  glycolysis,  gluconeogenesis,  amino 


20 


acid  synthesis,  cell  wall  synthesis,  nucleotide  synthesis,  lipid 
metabolism  and  the  heat  shock  response. 

Gygi  et  al.  (10)  have  recently  completed  a  study  similar  to  ours. 
Despite  generating  broadly  similar  data,  Gygi  et  al.  reached  markedly 
different  conclusions.  We  believe  that  both  mRNA  abundance  and 
codon  bias  are  useful  predictors  of  protein  abundance.  However, 
Gygi  et  al.  feel  that  mRNA  abundance  is  a  poor  predictor  of  protein 
abundance,  and  that  “codon  bias  is  not  a  predictor  of  either  protein 
or  mRNA  levels”.  These  different  conclusions  are  partly  a  matter  of 
viewpoint.  Gygi  et  al.  focus  on  the  fact  that  the  correlations  of  mRNA 
and  codon  bias  with  protein  abundance  are  far  from  perfect,  while 
we  focus  on  the  fact  that,  considering  the  wide  range  of  mRNA  and 
protein  abundance  and  the  undoubted  presence  of  other  mechanisms 
affecting  protein  abundance,  the  correlations  are  quite  good. 

However,  the  different  conclusions  are  also  partly  due  to 
different  methods  of  statistical  analysis,  and  to  real  differences  in 
data.  With  respect  to  statistics,  Gygi  et  al.  used  the  Pearson  product- 
moment  correlation  coefficient  (rp)  to  measure  the  covariance  of 
mRNA  and  protein  abundance.  Depending  on  the  subset  of  data 
included,  their  rp  values  ranged  from  0.1  to  0.94.  Because  of  the  low 
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Fp  values  with  some  subsets  of  the  data,  Gygi  et  al.  concluded  that  the 
correlation  of  mRNA  to  protein  was  poor.  However,  the  Pearson 
correlation  is  a  parametric  statistic,  and  so  requires  variates 
following  a  bivariate  normal  distribution;  that  is,  it  would  be  valid 
only  if  both  mRNA  and  protein  abundances  were  normally 
distributed.  In  fact,  both  distributions  are  very  far  from  normal 
(data  not  shown),  and  so  Pearson  correlation  coefficients  are 
inappropriate.  There  was  no  statistical  backing  for  the  assertion  that 
codon  bias  fails  to  predict  protein  abundance. 

We  have  taken  two  statistical  approaches.  First,  we  have  used 
the  Spearman  rank  correlation  coefficient  (r^).  Since  this  statistic  is 
non-parametric  there  is  no  requirement  for  the  data  to  be  normally 
distributed.  Using  r^,  we  find  that  mRNA  abundance  is  well 
correlated  with  protein  abundance  (r^  =  0.74),  and  the  codon 
adaptation  index  is  also  well  correlated  with  protein  abundance  (r^  = 
0.80)  (and  also  with  mRNA  abundance,  data  not  shown).  For  the  data 
of  Gygi  et  al.  (10),  we  obtained  similar  results,  though  with  their  data 
the  correlation  is  not  as  good;  r^  =  0.59  for  the  mRNA  to  protein 
correlation,  and  r^  =  0.59  for  the  codon  bias  to  protein  correlation. 
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In  a  second  approach,  we  transformed  the  mRNA  and  protein 
data  to  forms  where  it  was  normally  distributed,  to  allow  a  Pearson 
correlation  coefficient  (Materials  and  Methods).  Two  transformations 
were  used;  a  Box-Cox  transformation,  and  a  logarithmic 
transformation.  Both  methods  gave  good  correlations  with  our  data 
(e.g.,  rp  =  0.76  for  log(adjusted  mRNA)  to  log(protein)).  We  were  not 
able  to  transform  the  data  of  Gygi  et  al.  to  a  normal  distribution. 

Finally,  there  are  also  some  differences  in  data  between  the 
two  studies.  These  may  be  partly  due  to  the  different  measurement 
techniques  used:  Gygi  et  al.  measured  protein  abundance  by  cutting 
spots  out  of  gels,  and  measuring  the  radioactivity  in  each  spot  by 
scintillation  counting,  whereas  we  used  phosphorimaging  of  intact 
gels  coupled  to  image  analysis.  We  compared  our  data  to  theirs  for 
the  proteins  common  between  the  studies  (but  excluding  proteins 
whose  mRNAs  are  known  to  differ  between  rich  and  minimal  media, 
and  excluding  Tifl,  which  was  anomalous  in  differing  by  100-fold 
between  the  two  data  sets).  The  Spearman  rank  correlation  between 
the  two  protein  data  sets  was  0.88  (p  <0.0001).  Although  this  is  a 
strong  correlation,  the  fact  that  it  is.  less  than  1.0  suggests  that  there 
may  have  been  errors  in  measuring  protein  abundance  in  one  or 
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both  studies.  After  normalizing  the  two  data  sets  to  assume  the 
same  amount  of  protein  per  cell,  we  found  a  systematic  tendency  for 

the  protein  abundance  data  of  Gygi  et  al.  to  be  slightly  higher  than 

ours  for  the  highest  abundance  proteins  and  also  for  the  lowest 
abundance  proteins,  but  slightly  lower  than  ours  for  the  middle 
abundance  proteins.  These  systematic  differences  suggest  some 

systematic  errors  in  protein  measurement.  Although  we  do  not 
know  what  the  errors  are,  we  suggest  the  following  as  a  reasonable 
speculation:  For  the  highest  abundance  proteins,  we  may  have 

underestimated  the  amount  of  protein  because  of  a  slightly  non¬ 
linear  response  of  the  phosphorimager  screens.  For  the  lowest 
abundance  proteins,  Gygi  et  al.  may  have  over-estimated  the  amount 
of  protein  because  of  difficulties  in  accurately  cutting  very  small 
spots  out  of  the  gel,  and  because  of  difficulties  in  background 
subtraction  for  these  small,  weak  spots.  The  difference  in  the  middle 
abundance  proteins  may  be  a  consequence  of  normalization,  given 
the  two  errors  above. 

The  low  abundance  proteins  in  the  data  set  of  Gygi  et  al.  have  a 
poor  correlation  with  mRNA  abundance.  We  calculate  that  the 
Spearman  correlation  coefficient,  r^^  is  0.74  for  the  top  54  proteins  of 
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Gygi  et  al.,  but  only  =  0.22  for  the  bottom  53  proteins,  a 
statistically  significant  difference.  However,  with  our  data  set,  r^  = 
0.62  for  the  top  33  proteins,  and  r^  =  0.56  (not  significantly  different) 
for  the  bottom  33  proteins  (which  are  comparable  in  abundance  to 
the  bottom  53  proteins  of  Gygi  et  al.).  Thus,  our  data  set  maintains  a 
good  correlation  between  mRNA  and  protein  abundance  even  at  low 
protein  abundance.  This  is  consistent  with  our  speculation  that 
protein  quantification  by  phosphorimaging  and  image  analysis  may 
be  more  accurate  for  small,  weak  spots  than  is  cutting  out  spots 
followed  by  scintillation  counting.  Our  relatively  good  correlations 
even  for  non-abundant  proteins  may  also  reflect  the  fact  that  we 
used  both  SAGE  data  and  RNA  hybridization  data,  which  is  most 
helpful  for  the  least  abundant  mRNAs.  In  summary,  we  feel  that  the 
poor  correlation  of  protein  to  mRNA  for  the  non-abundant  proteins  of 
Gygi  et  al.  may  reflect  difficulty  in  accurately  measuring  these  non- 
abundant  proteins  and  mRNAs,  rather  than  indicating  a  truly  poor 
correlation  in  vivo.  It  is  not  surprising  that  observed  correlations 
would  be  poorer  with  less  abundant  proteins  and  mRNAs,  simply 
because  the  accuracy  of  measurement  would  be  worse. 
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How  well  can  mRNA  abundance  predict  protein  abundance? 
With  rp  =  0.76  for  logarithmically  transformed  mRNA  and  protein 
data,  the  co-efficient  of  determination,  (rp)^  is  0.58.  This  means  that 
more  than  half  (in  log  space)  of  the  variation  in  protein  abundance  is 
explained  by  variation  in  mRNA  abundance.  When  converted  back  to 
arithmetic  values,  protein  abundances  vary  over  about  200-fold 
(Table  1),  and  (rp)^  =  0.58  for  the  log  data  means  that  of  this  200-fold 
variation,  about  20-fold  is  explained  by  variation  in  the  abundance 
of  mRNA,  and  about  10-fold  is  unexplained  (but  could  be  due  partly 
to  measurement  errors).  For  proteins  much  less  abundant  than  those 
considered  here,  we  imagine  the  in  vivo  correlation  between  mRNA 
and  protein  abundance  will  be  worse,  and  other  regulatory 

mechanisms  such  as  protein  turnover  will  be  more  important. 

Some  important  conclusions  can  be  drawn  from  this  sampling 
of  the  proteome.  First,  there  is  an  enormous  range  of  protein 

abundance,  from  nearly  2,000,000  molecules  per  cell  for  some 
glycolytic  enzymes  to  about  100  per  cell  for  some  cell  cycle  proteins 
(Tyers  and  Futcher,  unpublished).  Second,  about  half  of  all  cellular 
protein  is  found  in  less  than  100  different  gene  products,  which  are 
mostly  involved  in  carbohydrate  metabolism  or  protein  synthesis. 
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Third,  the  correlation  between  protein  abundance  and  codon 
adaptation  index  is  log-linear  as  far  as  we  can  see,  which  is  from 
about  10,000  protein  molecules  per  cell  to  about  1,000,000.  This  is 
somewhat  surprising,  because  it  implies  that  selective  forces  for 
codon  bias  are  significant  even  at  moderate  expression  levels.  It  also 
means  that  codon  bias  is  a  useful  predictor  of  protein  abundance 
even  for  moderately  low-bias  proteins.  Fourth,  there  is  a  good 
correlation  between  protein  abundance  and  mRNA  abundance  for  the 
proteins  we  have  studied.  This  validates  the  use  of  mRNA 
abundance  as  a  rough  predictor  of  protein  abundance,  at  least  for 
relatively  abundant  proteins.  Fifth,  for  these  abundant  proteins, 
there  are  about  4,000  molecules  of  protein  for  each  molecule  of 
mRNA.  This  last  conclusion  raises  questions  as  to  how  the  levels  of 
non-abundant  proteins  are  regulated,  and  suggests  that  protein 
instability,  regulated  translation,  sub-optimal  rates  of  translation  and 
other  mechanisms  in  addition  to  transcriptional  control  may  be  very 
important  for  these  proteins. 
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Materials  and  methods 


Strains  and  Media. 

Strain  W303  {MATa  ade2-l  his3-ll,15  leu2-3,  112  trpl-1  ura3-l 
canl-lOO)  was  used  (26).  -met  YNB  medium  was  1.7  g/L  of  Yeast 
Nitrogen  Base  (Difco),  5  g/L  of  amonium  sulfate,  and  adenine,  uracil, 
and  all  amino  acids  except  methionine,  -met  -cys  was  the  same,  but 
without  methionine  or  cysteine.  Medium  was  supplemented  with  2% 
glucose  for  most  experiments,  or  with  2%  ethanol  for  ethanol 
experiments.  Low  phosphate  YEPD  was  described  by  Warner  (28). 
Isotopic  Labeling  of  Yeast  and  Preparation  of  Cell  Extracts. 

Yeast  were  labeled  and  proteins  extracted  as  described  by 
Garrels  et  al.  (7,  8).  Briefly,  cells  were  grown  to  5  x  10^  cells  per  ml. 
at  30'’C.  1  ml  of  culture  was  transferred  to  a  fresh  tube,  and  0.3  mCi 
of  methionine  (e.g..  Express  Protein  Labeling  mix.  New  England 
Nuclear)  was  added  to  this  1  ml  culture.  The  cells  were  incubated  for 
a  further  10  to  15  min.  The  cells  were  transferred  to  a  1.5  ml 
microcentrifuge  tube,  chilled  on  ice,  and  harvested  by  centrifugation. 
The  supernatant  was  removed,  and  the  cell  pellet  was  resuspended 
in  100  pi  of  lysis  buffer.  (Lysis  buffer  was  20  mM  Tris-HCl,  pH  7.6; 
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10  mM  NaF;  10  mM  sodium  pyrophosphate;  0.5  mM  EDTA;  0.1% 
deoxycholate.  Just  before  use,  PMSF  was  added  to  the  lysis  buffer  to 
1  mM,  leupeptin  was  added  to  1  pg/ml,  pepstatin  was  added  to  1 
pg/ml,  TPCK  was  added  to  10  pg/ml  TPCK,  and  soybean  trypsin 
inhibitor  was  added  to  10  pg/ml). 

The  resuspended  cells  were  transferred  to  a  screw-cap  1.5  ml 
polypropylene  tube  containing  0.28  g  of  glass  beads  (0.5  mm 
diameter,  Biospec  Products),  or  0.40  g  of  zirconia  beads  (0.5  mm, 
Biospec  Products).  After  securing  the  cap,  the  tube  was  inserted  into 
a  MiniBeadbeater  8  (Biospec  Products)  and  shaken  at  medium  high 
speed  at  4°C  for  1  min.  Breakage  was  typically  75%.  Tubes  were 
then  spun  in  a  microcentrifuge  for  10  sec.  at  5,000  g  at  4°C. 

Using  a  very  fine  pipet  tip,  liquid  was  withdrawn  from  the 
beads  and  transferred  to  a  pre-chilled  1.5  ml  tube  containing  7  pi  of 
DNase/RNase/Mg  mix  (0.5  mg/ml  DNase  I,  Cooper  #6330;  0.25  mg/ml 
RNase  A,  Cooper  #5679;  50  mM  MgCl^).  Typically  70  pi  of  liquid  was 

recovered.  The  mixture  was  incubated  on  ice  for  10  min.  to  allow  the 
RNase  and  DNase  to  work. 

Next,  75  pi  of  2  x  dSDS  (2  x  dSDS  =  0.6%  SDS,  2%  mercapto- 
ethanol,  0.1  M  Tris-HCl,  pH  8)  was.  added.  The  tube  was  plunged  into 
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boiling  water  and  incubated  for  1  min.  It  was  then  plunged  into  ice. 
After  cooling,  the  tube  was  centrifuged  at  4°C  for  3  min.  at  14,000  g. 
The  supernatant  was  transferred  to  a  fresh  tube  and  frozen  at  -TO^C. 
About  5  pi  of  this  supernatant  was  used  for  each  2D  gel. 

Two  Dimensional  Polyacrylamide  Gels. 

2D  gels  were  made  and  run  as  described  (6,  7,  8). 

Image  Analysis  of  the  Gels. 

The  Quest  II  software  system  was  used  for  quantitative  image 
analysis  (20,  22).  Two  techniques  were  used  to  collect  quantitative 
data  for  analysis  by  Quest  II  software.  First,  before  the  advent  of 
phosphorimagers,  gels  were  dried  and  fluorographed.  Each  gel  was 
exposed  to  film  for  three  different  times  (typically  1  day,  2  weeks 
and  6  weeks)  to  increase  the  dynamic  range  of  the  data.  The  films 
were  scanned  along  with  calibration  strips  to  relate  film  optical 
density  (OD)  to  disintegrations  per  minute  (dpm)  in  the  gels  and 
analyzed  by  the  software  to  obtain  a  linear  relationship  between 
dpm  in  the  spots  and  OD  of  the  film  images.  The  quantitative  data  is 
expressed  as  parts  per  million  (ppm)  of  the  total  cellular  protein. 
This  value  is  calculated  from  the  dpm  of  the  sample  loaded  onto  the 
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gel  and  by  comparing  the  film  density  of  each  data  spot  with  density 
of  the  film  over  the  calibration  strips  of  known  radioactivity  exposed 
to  the  same  film.  This  yields  the  dpm/mm  for  each  spot  on  the  gel 
and  thence  its  ppm  value. 

After  the  advent  of  phosphorimaging,  gels  bearing  ^^S-labeled 
proteins  were  exposed  to  phosphorimager  screens,  and  scanned  by  a 
Fuji  phosphorimager,  typically  for  two  exposures  per  gel.  Calibration 
strips  of  known  radioactivity  were  exposed  simultaneously.  Scan 
data  from  the  phosphorimager  was  assimilated  by  Quest  II  software, 
and  quantitative  data  were  recorded  for  the  spots  on  the  gels. 

Measurements  of  Protein  turnover. 

Cells  in  exponential  phase  were  pulse-labeled  with  ^‘’S- 
methionine,  an  excess  of  cold  met  and  cys  were  added,  and  samples 
of  equal  volume  were  taken  from  the  culture  at  intervals  up  to  9  0 
min.  (in  one  experiment)  or  up  to  160  min.  (in  a  second  experiment). 
Incorporation  of  into  protein  was  essentially  100%  by  the  first 
sample  (10  min.).  Extracts  were  made,  and  equal  fractions  of  the 
samples  were  loaded  on  2D  gels  (i.e.,  the  different  samples  had 
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different  amounts  of  protein,  but  equal  amounts  of  ^^S).  Spots  were 
quantitiated  using  phosphorimaging  and  Quest  software. 

The  software  was  queried  for  spots  whose  radioactivity 
decreased  through  the  time  course.  The  algorithm  examined  all  data 
points  for  all  spots,  drew  a  best-fit  line  through  the  data  points,  and 
looked  for  spots  where  this  line  had  a  statistically  significant 
negative  slope.  In  one  of  the  experiments,  there  was  one  such  spot. 
To  the  eye,  this  was  a  minor,  unidentified  spot  seen  only  in  the  first 
two  samples  (10  and  20  min.).  In  the  other  experiment,  the  Quest 
software  found  no  spots  meeting  the  criteria.  Therefore  w  e 
concluded  that  none  of  the  identified  spots  (and  all  but  one  of  the 
visible  spots)  represented  proteins  with  long  half-lives. 

Centrifugal  Fractionation. 

Cells  were  labeled,  harvested,  and  broken  with  glass  beads  b  y 
the  standard  method  described  above,  except  that  no  detergent  (i.e., 
no  deoxycholate)  was  present  in  the  lysis  buffer.  The  crude  lysate 
was  cleared  of  unbroken  cells  and  large  debris  by  centrifugation  at 
300  g  for  30  sec.  The  supernatant  of  this  centrifugation  was  then 
spun  at  16,000  g  for  10  min.  to  give  the  pellet  used  for  Fig.  6B.  The 
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supernatant  of  the  16,000  g,  10  min.  spin  was  then  spun  at  100,000 
g  for  30  min.  to  give  the  supernatant  used  for  Fig.  6A. 

Protein  abundance  calculations. 

A  haploid  yeast  cell  contains  about  4  x  10''^  g  of  protein  (1,  15). 
Assuming  a  mean  protein  mass  of  50  kDa,  there  are  about  50  x  10^ 
molecules  of  protein  per  cell.  There  are  about  1.8  methionines  per 
10  kDa  of  protein  mass,  which  implies  4.5  x  10*  molecules  of 
methionine  per  cell  (neglecting  the  small  pool  of  free  met).  We 

measured  the  cpm  in  each  spot  on  the  2D  gels;  we  measured  the  total 
number  of  cpm  on  each  gel  (by  integrating  counts  over  the  entire 
gel)  and  we  measured  the  total  number  of  cpm  loaded  on  the  gel  (by 
scintillation  counting  of  the  original  sample).  Thus  we  know  what 

fraction  of  the  total  incorporated  radioactivity  is  present  in  each  spot. 
After  correcting  for  the  methionine  (and  cysteine,  see  below)  content 
of  each  protein,  we  calculated  an  absolute  number  of  protein 
molecules  based  on  the  fraction  of  radioactivity  in  each  spot,  and 

based  on  50  x  10^  total  molecules  per  cell. 

The  labeling  mixture  used  contained  about  one-fifth  as  much 
radioactive  cysteine  as  radioactive  methionine.  Therefore,  the 

number  of  cysteine  molecules  per  protein  was  also  taken  into 
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account  in  calculating  the  number  of  molecules  of  protein,  but  cys 
molecules  were  weighted  one-fifth  as  heavily  as  met  molecules. 
mRNA  abundance  calculations. 

For  estimation  of  mRNA  abundance,  we  used  SAGE  data  (27), 
and  Affymetrix  chip  hybridization  data  (30;  L.  Wodicka,  pers. 
comm.).  The  mRNA  column  in  Table  1  shows  mRNA  abundance 
calculated  from  SAGE  data  alone.  However,  the  SAGE  data  came  from 
cells  growing  in  YEPD  medium,  whereas  our  protein  measurements 
were  from  cells  growing  in  YNB  medium.  In  addition,  SAGE  data  for 
low  abundance  mRNAs  suffers  from  statistical  variation.  Therefore 
we  also  used  chip  hybridization  data  (30  and  Wodicka,  pers.  comm.) 
for  mRNA  from  cells  grown  in  YNB.  These  hybridization  data  also 
had  disadvantages.  First,  the  abundance  of  high  abundance  mRNAs 
was  systematically  underestimated,  probably  because  of  saturation 
in  the  hybridizations,  which  used  10  |ig  of  cRNA.  For  example,  the 
abundance  of  ADHl  mRNA  was  197  copies  per  cell  by  SAGE,  but  only 
32  copies  per  cell  by  hybridization,  and  the  abundance  of  EN02 
mRNA  was  248  copies  per  cell  by  SAGE  but  only  41  by  hybridization. 
When  the  amount  of  cRNA  used  in  .the  hybridization  was  reduced  to 
1  pg,  the  apparent  amounts  of  mRNA  were  similar  to  the  amounts 
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determined  by  SAGE  (L.  Wodicka,  unpublished  results,  pers.  comm.). 
However,  experiments  using  1  pg  of  cRNA  have  been  done  for  only 
some  genes  (L.  Wodicka,  pers.  comm.).  Because  amounts  of  mRNA 
were  normalized  to  15,000  per  cell,  and  because  the  amounts  of 
abundant  mRNAs  were  underestimated,  there  is  a  2.2  fold  over¬ 
estimate  of  the  abundance  of  non-abundant  mRNAs.  We  calculated 
this  factor  of  2.2  by  adding  together  the  number  of  mRNA  molecules 
from  a  large  number  of  genes  expressed  at  a  low  level  for  both  SAGE 
data  and  hybridization  data.  The  sum  for  the  same  genes  from 
hybridization  data  is  2.2  fold  greater  than  from  SAGE  data. 

To  take  into  account  these  difficulties,  we  compiled  a  list  of 
“adjusted”  mRNA  abundance  as  follows;  For  all  high  abundance 
mRNAs  of  our  identified  proteins,  we  used  SAGE  data.  For  all  of 
these  particular  mRNAs,  chip  hybridization  suggested  that  mRNA 
abundance  was  the  same  in  YEPD  and  YNB  media.  For  medium 
abundance  mRNAs,  SAGE  data  was  used,  but,  when  hybridization 
data  showed  a  significant  difference  between  YEPD  and  YNB,  then  the 
SAGE  data  was  adjusted  by  the  appropriate  factor.  Finally,  for  low 
abundance  mRNAs,  we  used  data  from  chip  hybridizations  from  YNB 


35 


medium,  but  divided  by  2.2  to  normalize  to  the  SAGE  results.  These 
calculations  were  completed  without  reference  to  protein  abundance. 

Codon  Adaptation  Index. 

The  Codon  Adaptation  Index  was  taken  from  the  Yeast 
Proteome  Database  (YPD)  (13)  for  which  calculations  were  made 
according  to  Sharp  and  Li  (24).  Briefly,  the  index  uses  a  reference 
set  of  highly  expressed  genes  to  assign  a  value  to  each  codon,  and 
then  a  score  for  a  gene  is  calculated  from  the  frequency  of  use  of  the 
various  codons  in  that  gene  (24). 

Statistical  Analysis. 

The  “IMP”  program  was  used  with  the  aid  of  Dr.  T.  Tully.  The 
IMP  program  showed  that  neither  mRNA  nor  protein  abundances 
were  normally  distributed;  therefore  Spearman  correlation 
coefficients  were  calculated.  The  mRNA  (adjusted  and  unadjusted) 
and  protein  data  were  also  transformed  so  that  Pearson  correlation 
coefficients  could  be  calculated.  First,  this  was  done  by  a  Box-Cox 
transformation  of  log-transformed  data.  This  transformation 
produced  normal  distributions,  and  a  Pearson  correlation  coefficient 
of  0.76  was  achieved.  However,  because  the  Box-Cox  transformation 
is  complex,  we  also  did  a  simpler  ■  logarithmic  transformation.  This 


36 


produced  a  normal  distribution  for  the  protein  data.  However,  the 
distribution  for  the  mRNA  and  adjusted  mRNA  data  was  close  to,  but 
not  quite,  normal.  Nevertheless,  we  calculated  the  Pearson 
correlation  coefficient,  and  found  that  it  was  0.76,  identical  to  the 
coefficient  from  the  Box-Cox  transformed  data.  We  therefore  believe 
that  this  correlation  coefficient  is  not  misleading,  despite  the  fact  that 
the  log(mRNA)  distribution  is  not  quite  normal. 
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Table  1.  Summary  of  Quantitative  Data 


Proteins  are  grouped  roughly  by  function  (carbohydrate  metabolism; 
protein  synthesis;  heat  shock;  amino  acid  synthesis;  miscellaneous), 
and  are  alphabetical  within  each  group.  The  Codon  Adaptation  Index 
(CAI),  a  measure  of  codon  bias,  is  taken  from  the  Yeast  Proteome 
Database.  “mRNA”  is  the  number  of  mRNA  molecules  per  cell  from 
SAGE  data  (27),  and  “Aj.  mRNA”  (Adjusted  mRNA)  is  the  number  of 
mRNA  molecules  per  cell  based  on  both  SAGE  and  chip  hybridization 
(30)  (see  Materials  and  Methods).  “Protein  (Glu)”  is  the  number  of 
molecules  of  protein  per  cell  in  YNB  glucose,  and  “Protein  (Eth)”  is  the 
number  of  molecules  of  protein  per  cell  in  YNB  ethanol.  “Ratio  E/G”  is 
the  ratio  of  protein  abundance  in  ethanol  to  glucose.  Protein 
molecules  are  shown  in  thousands;  for  instance,  there  are  1,230,000 
molecules  of  Adhl  per  cell  in  glucose,  and  197  molecules  of  mRNA. 
The  E/G  ratio  is  not  given  if  it  was  close  to  1,  or  if  it  was  not 
repeatable  (NR)  in  multiple  gels.  Some  gene  products  were  difficult 
to  distinguish  either  on  a  protein  or  an  mRNA  basis;  these  are  pooled 
(e.g.,  Tifl  and  Tif2  are  pooled;  Ssbl.  and  Ssb2  are  pooled).  “No  Nla” 
indicates  that  there  was  no  suitable  Nlalll  site  in  the  3’  region  of  the 
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gene,  so  there  is  no  SAGE  mRNA  data.  “No  Met”  indicates  that  the 
mature  gene  product  contains  no  methionines,  so  there  is  no  reliable 


protein  data. 


Table  1. 


Quantitative  Data 


Name 

CAI 

mRNA 

Aj .mRNA 

Protein 
(Glu)  10' 

Protein 
(Eth)  10' 

Ratio 

Adhl 

0 . 810 

197 

197 

1230 

972 

0.79 

Adh2 

0 .504 

0 

0 

963 

>  20 

Cit2 

0 .185 

1 

2 . 8 

23 

288 

12 

Enol 

0.870 

no  Nla 

410 

974 

2.4 

Eno2 

0 . 892 

248 

248 

650 

215 

0.33 

Fbal 

0.868 

179 

179 

640 

608 

0 .95 

Hxkl , 2 

0.50 

13 

10 . 5 

62 

46 

Icll 

0.251 

0 

0 

671 

>  20 

Pdbl 

0 .342 

5 

5 

41 

33 

Pdcl 

0 .903 

226 

226 

280 

205 

0.73 

Pfkl 

0.465 

5 

5 

75 

53 

0.71 

Pgil 

0 . 681 

14 

14 

160 

120 

0.75 

Pycl 

0.260 

1 

0.7 

37 

34 

Tall 

0 . 579 

5 

5 

110 

35 

Tdh2 

0 . 904 

63 

63 

430 

876 

NR 

Tdh3 

0 . 924 

460 

460 

1670 

1927 

NR 

Tpil 

0 . 817 

no  Nla 

no  Met 

no  Met 

Efbl 

0.762 

33 

16.5 

358 

362 

Ef tl , 2 

0 .801 

26 

26 

99 

54 

0 . 55 

Prtl 

0.303 

4 

0.7 

12 

6 

RpaO 

0.793 

246 

246 

277 

100 

0.36 

Tifl, 2 

0.752 

29 

29 

233 

106 

0.46 

Yef3 

0 .777 

36 

36 

14 

nd 

Hsc82 

0 .581 

2 

2.9 

112 

75 

0 . 67 

Hsp60 

0 .381 

9 

2.3 

35 

82 

2.3 

Hsp82 

0 . 517 

2 

1.3 

52 

135 

2.6 

Hspl04 

0.304 

7 

7 

70 

161 

2.3 

Kar2 

0 .439 

5 

10.1 

43 

102 

2.4 

Ssal 

0.709 

2 

4.3 

303 

421 

1.4 

Ssa2 

0 . 802 

10 

5 

213 

324 

1 . 5 

Ssbl , 2 

0 .85 

50 

50 

270 

85 

Sscl 

0.521 

2 

2 . 6 

68 

80 

1.2 

Ssel 

0.521 

8 

8 

96 

48 

Stil 

0 .247 

1 

1.1 

25 

44 

1.7 

47 


Table  1 

(Continued) 

Name 

CAI 

mRNA 

Aj .mRNA 

Protein 

Protein 

Ratio 

(Glu)  10' 

(Eth)  10' 

Adel 

0.229 

4 

4 

14 

27 

Ade3 

0.276 

2 

1.7 

12 

9 

Ade5 , 7 

0.257 

2 

1.4 

14 

4 

Arg4 

0.229 

1 

8.1 

41 

41 

Gdhl 

0 .585 

10 

27 

148 

55 

Glnl 

0 . 524 

11 

11 

77 

104 

1.3 

His4 

0.267 

3 

3 

15 

23 

1.5 

Ilv5 

0 .801 

6 

6 

152 

109 

0.7 

Lys9 

0 .332 

4 

4 

32 

17 

0 . 52 

Met6 

0 . 657 

No  Nla 

22 

190 

80 

0.42 

Pro2 

0.248 

3 

3 

30 

12 

Seri 

0.258 

2 

1.2 

15 

8 

Trp5 

0.319 

5 

5 

28 

12 

Actl 

0 .710 

54 

54 

205 

164 

0.78 

Adkl 

0.531 

No  Nla 

47 

43 

Aide 

0.520 

3 

3 

181 

159 

Atp2 

0 . 424 

1 

4.1 

76 

109 

1.4 

Bmhl 

0 . 322 

46 

46 

191 

137 

0.72 

Bmh2 

0.384 

1 

1.4 

134 

147 

Cdc48 

0 .306 

2 

2.4 

32 

26 

Cdc60 

0.299 

2 

0.86 

6 

2 

Erg2  0 

0 .373 

5 

5 

92 

39 

Gppl 

0 . 603 

16 

5 

234 

158 

Gspl 

0 . 621 

3 

3 

115 

39 

0.34 

Ippl 

0.620 

4 

4 

254 

147 

0 . 58 

Lcbl 

0 . 173 

0.3 

0.8 

19 

40 

Moll 

0.423 

0 

0.45 

20 

16 

Pabl 

0.488 

3 

3 

41 

19 

0.47 

Psal 

0 . 600 

15 

15 

148 

56 

Rnr4 

0 .497 

6 

6 

44 

37 

Sami 

0.494 

5 

5 

59 

21 

Sam2 

0 .497 

3 

15 

63 

20 

Sodl 

0.376 

36 

36 

631 

618 

Ubal 

0.212 

2 

2 

14 

20 

YKL056 

0 .731 

62 

62 

253 

112 

0 . 44 

YLR109 

0 . 549 

21 

21 

930 

YMR116 

0 .777 

41 

41 

184 

40 

0.20 
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Table  2.  Functions  of  Proteins  listed  in  Table  1. 


Ypd'  v-  Title  Lines" 

Adhl 

Alcohol  dehydrogenase  I;  cytoplasmic  isozyme  reducing  acetaldehyde  to  ethanol, 
regenerating  NAD+ 

Adh2 

Alcohol  dehydrogenase  II;  oxidizes  ethanol  to  acetaldehyde,  glucose-repressed 

Cit2 

Citrate  synthase,  peroxisomal  (nonmitochondrial),  converts  acetyl-CoA  and  oxaloacetate 
into  citrate  plus  CoA 

Enol 

Enolase  1  (2-phosphoglycerate  dehydratase),  converts  2-phospho-D-glycerate  to 
phosphoenolpyruvate  in  glycolysis 

Eno2 

Enolase  2  (2-phosphoglycerate  dehydratase);  converts  2-phospho-D-glycerate  to 
phosphoenolpyruvate  in  glycolysis 

Fbal 

Fructose-bisphosphate  aldolase  II,  sixth  step  in  glycolysis 

Hxkl 

Hexokinase  I,  converts  hexoses  to  hexose  phosphates  in  glycolysis;  repressed  by 
glucose 

Hxk2 

Hexokinase  II,  converts  hexoses  to  hexose  phosphates  in  glycolysis  and  plays  a 
regulatory  role  in  glucose  repression 

Icll 

Isocitrate  lyase,  peroxisomal,  carries  out  part  of  the  glyoxylate  cycle,  required  for 
gluconeogenesis 

Pdbl 

Pyruvate  dehydrogenase  complex.  El-beta  subunit 

Pdcl 

Pyruvate  decarboxylase  isozyme  1 

Pfkl 

Phosphofructokinase  alpha  subunit,  part  of  a  complex  with  Pfk2p  which  carries  out  a 
key  regulatory  step  in  glycolysis 

Glucose-6-phosphate  isomerase,  converts  glucose  6-phosphate  to  fructose  6-phosphate 

Pyruvate  carboxylase  1 ;  converts  pyruvate  to  oxaloacetate  for  gluconeogenesis 

Tall 

Transaldolase,  component  of  non-oxidative  part  of  pentose-phosphate  pathway 

Tdh2 

Glyceraldehyde-3-phosphate  dehydrogenase  2;  converts  D-glyceraldehyde  3-phosphate 
to  1 ,3-dephosphoglycerate 

Tdh3 

Glyceraldehyde-3-phosphate  dehydrogenase  3,  converts  D-glyceraldehyde  3-phosphate 
to  1,3-dephosphoglycerate 

Tpil 

Triosephosphate  isomerase,  interconverts  glyceraldehyde-3-phosphate  and 
dihydroxyacetone  phosphate 

Efbl 

Translation  elongation  factor  EF-lbeta;  GDP/GTP  exchange  factor  for  Teflp/Tef2p 

Eftl 

Translation  elongation  factor  EF-2,  contains  diphthamide  which  is  not  essential  for 
activity,  identical  to  Eft2p 

Eft2 

Translation  elongation  factor  EF-2,  contains  diphthamide  which  is  not  essential  for 
activity,  identical  to  Eft  Ip 

Prtl 

Translation  initiation  factor  eIF3  beta  subunit  (p90),  has  an  RNA  recognition  (RRM) 
domain 

RpaO 

(RPPO) 

Acidic  ribosomal  protein  AO 

Tifl 

Translation  initiation  factor  4A  (eIF4A)  of  the  DEAD  box  family 

Tif2 

Translation  initiation  factor  4A  (eIF4A)  of  the  DEAD  box  family 

Yef3 

Translation  elongation  factor  EF-3A,  member  of  ATP-binding  cassette  (ABC) 
superfamily 
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Hsc82 


82 


Hspl04 


Ade3 


Ade5,7 


Arg4 


Gdhl 


Act! 


Adkl 


Ald6 


Atp2 


Bmhl 


Bmh2 


Cdc48 


Chaperonin  homologous  to  Kcoli  HtpG  and  mammalian  HSP90 


Mitochondrial  chaperonin  that  cooperates  with  HsplOp,  homolog  of  E.  coll  GroEL 


Heat-inducible  chaperonin  homologous  to  E.  coli  HtpG  and  mammalian  HSP90 


Heat  shock  protein  required  for  induced  thermotolerance  and  for  resolubilizing 
aggregates  of  denatured  proteins;  important  for  [psi-]  to  [PSI+]  prion  conversion  _ 


Heat  shock  protein  of  the  ER  lumen  required  for  protein  translocation  across  the  ER 
membrane  and  for  nuclear  fusion;  member  of  the  HSP70  family 


Cytoplasmic  chaperone;  heat  shock  protein  of  the  HSP70  family 


Cytoplasmic  chaperone;  member  of  the  HSP70  family 


Heat  shock  protein  of  HSP70  family  involved  in  the  translational  apparatus 


Heat  shock  protein  of  HSP70  family,  cytoplasmic 


Mitochondrial  protein  that  acts  as  an  import  motor  with  Tim44p  and  plays  a  chaperonin 
role  in  receiving  and  folding  of  protein  chains  during  import;  heat  shock  protein  of 
HSP70  family 


Heat  shock  protein  of  the  HSP70  family,  multicopy  suppressor  of  mutants  with 
hyperactivated  ras/cAMP  pathway 


Stress-induced  protein  required  for  optimal  growth  at  high  and  low  temperature;  has 
tetratricopeptide  (TPR)  repeats 


Phosphoribosylamidoimidazole-succinocarboxamide  synthase;  (SAICAR  synthetase), 
catalyzes  the  seventh  step  in  de  novo  purine  biosynthesis  pathway 


Cl-tetrahydrofolate  synthase  (tnfunctional  enzyme),  cytoplasmic  _ 


Phosphoribosylamine-glycine  ligase  (GARSase)  plus 

Phosphoribosylformylglycinamidine  cyclo-ligase  (AIRSase);  bifunctional  protein _ 


Argininosuccinate  lyase;  catalyzes  the  final  step  in  arginine  biosynthesis _ 


Glutamate  dehydrogenase  (NADP+),  combines  ammonia  and  alpha-ketoglutarate  to  form 
glutamate 


Glutamine  synthetase,  combines  ammonia  to  glutamate  in  ATP-driven  reaction _ 


Phosphoribosyl-AMP  cyclohydrolase  /  phosphoribosyl-ATP  pyrophosphohydrolase  / 
histidinol  dehydrogenase,  second,  third,  and  tenth  steps  of  his  biosynthesis  pathway 


Ketol-acid  reductoisomerase  (acetohydroxy-acid  reductoisomerase)  (alpha-keto-beta- 
hydroxylacil  reductoisomerase),  second  step  in  val  and  ilv  biosynthesis  pathway 


Saccharopine  dehydrogenase  (NADP+,  L-glutamate  forming)  (saccharopine  reductase), 
seventh  step  in  lysine  biosynthesis  pathway _ 


Homocysteine  methyltransferase;  (5-methyltetrahydropteroyl  triglutamate-homocysteine 
methyltransferase),  methionine  synthase,  cobalamin-independent  _ 


Gamma-glutamyl  phosphate  reductase  (phosphoglutamate  dehydrogenase),  proline 
biosynthetic  enzyme 


Phosphoserine  transaminase;  involved  in  synthesis  of  serine  from  3-phosphoglycerate 


Tryptophan  synthase,  last  (fifth)  step  in  tryptophan  biosynthesis  pathway _ 


Actin,  involved  in  cell  polarization,  endocytosis,  and  other  cytoskeletal  functions 


Adenylate  kinase  (GTPiAMP  phosphotransferase),  cytoplasmic  _ 


Cytosolic  acetaldehyde  dehydrogenase 


Beta  subunit  of  FI -ATP  synthase;  3  copies  are  found  in  each  FI  oligomer _ 


Homolog  of  mammalian  14-3-3  protein,  has  strong  similarity  to  Bmh2 


Homolog  of  mammalian  14-3-3  protein,  has  strong  similarity  to  Bmhl 


Protein  of  the  AAA  family  of  ATPases,  required  for  cell  division  and  homotypic 
membrane  fusion 


50 


Cdc60 

Leucyl-tRNA  synthetase,  cytoplasmic 

Erg20 

Farnesyl  pyrophosphate  synthetase  (FPP  synthetase),  may  be  rate-limiting  step  in  sterol 
biosynthesis  pathway 

Gppl 

(Rhr2) 

DL-glycerol  phosphate  phosphatase 

Gspl 

Ran,  a  GTP-binding  protein  of  the  ras  superfamily  involved  in  trafficking  through 
nuclear  pores 

Ippl 

Inorganic  pyrophosphatase,  cytoplasmic 

Lcbl 

Component  of  serine  C-palmitoyltransferase,  first  step  in  biosynthesis  of  long-chain  base 
component  of  sphingolipids 

Moll 

(Thi4) 

Thiamine-repressed  protein  essential  for  growth  in  the  absence  of  thiamine 

Pabl 

Poly(A)-binding  protein  of  cytoplasm  and  nucleus,  part  of  the  3'-end  RNA-processing 
complex  (cleavage  factor  I),  has  4  RNA  recognition  (RRM)  domains 

Psal 

Mannose- 1 -phosphate  guanyltransf erase;  GDP-mannose  pyrophosphorylase 

Rnr4 

Ribonucleotide  reductase  small  subunit 

Sami 

S-adenosylmethionine  synthetase  1 

Sam2 

S-adenosylmethionine  synthetase  2 

Sodl 

Copper-zinc  superoxide  dismutase 

Ubal 

Ubiquitin-activating  (El)  enzyme 

YKL056 

Resembles  translationally-controlled  tumor  protein  (TCTP)  of  animal  cells  and  higher 
plants 

YLR109 

(Ahpl) 

Alkyl  hydroperoxide  reductase 

YMR116 

(Ascl) 

Abundant  protein  with  effects  on  translational  efficiency  and  cell  size,  has  two  'WD  (WD- 
40)  repeats 

‘  Protein  names  are  the  accepted  names  from  the  Saccharomyces  Genome  Database 
and  YPD.  Names  in  parentheses  represent  recent  name  changes.. 

^  YPD  Title  Lines  are  courtesy  of  Proteome,  Inc. 
(http://www.proteome.com/YPDhome.html'). 

©  1999  Proteome,  Inc.  Reprinted  with  permission. 
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Figure  1.  Two-dimensional  gels. 


The  horizontal  axis  is  the  isoelectric  focusing  dimension,  which 
stretches  from  pH  6.7  (left)  to  pH  4.3  (right).  The  vertical  axis  is  the 
polyacrylamide  gel  dimension,  which  stretches  from  about  15  kDa 
(bottom)  to  at  least  130  kDa  (top).  For  Fig.  lA,  extract  was  made 
from  cells  in  log  phase  in  glucose,  while  for  IB,  cells  were  grown  in 
ethanol.  The  spots  labeled  1,  2,  3,  4,  5,  and  6  are  unidentified 
proteins  highly  induced  in  ethanol. 

Figure  2.  Correlation  of  Protein  Abundance  with  adjusted  mRNA 
Abundance. 

The  number  of  molecules  per  cell  of  each  protein  is  plotted 
against  the  number  of  molecules  per  cell  of  the  cognate  mRNA,  with  a 
Pearson  correlation  coefficient  of  0.76.  Note  the  logarithmic  axes. 
Data  for  mRNA  was  taken  from  Velculescu  et  al.  (27)  and  Wodicka  et 
al.  (30)  and  combined  as  described  in  Materials  and  Methods. 
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Figure  3.  Correlation  of  Protein  Abundance  with  Codon  Adaptation 
Index. 

The  number  of  molecules  per  cell  of  each  protein  is  plotted 
against  the  Codon  Adaptation  Index  for  that  protein.  Note  the 
logarithmic  scale  on  the  protein  axis.  Data  for  the  Codon  Adaptation 
Index  was  taken  from  the  YPD  Database  (13) 

Figure  4.  Distribution  of  Codon  Adaptation  Index. 

The  distribution  of  the  Codon  Adaptation  Index  over  the  whole 
genome  is  shown  in  intervals  of  0.030.  That  is,  there  are  150  genes 
with  a  CAI  between  0.000  and  0.030,  inclusive;  31  genes  with  a  CAI 
between  0.031  and  0.060;  269  genes  with  a  CAI  between  0.061  and 
0.090;  1296  genes  with  a  CAI  between  0.091  and  0.120;  etc.  The 
distribution  peaks  with  2028  genes  with  a  CAI  between  0.121  and 
0.150. 


Figure  5.  Phosphorylated  Proteins. 

Fig.  5A  shows  a  mixture  of  ^^P-labeled  proteins  and  labeled 
proteins.  Two  separate  labeling  reactions  were  done,  one  with  ^'P 
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and  one  with  ^“’S,  and  extracts  were  mixed  and  run  on  a  2D  gel.  Spots 
marked  with  numbers  (e.g.,  6400,  6560)  rather  than  gene  names 
represent  spots  noted  on  ^‘’S  gels,  but  unidentified.  Spots  labeling 
with  were  identified  by  (i)  increased  labeling  as  compared  to  the 
^^S-only  gel  (not  shown);  (ii)  the  characteristic  fuzziness  of  a 
labeled  spot;  (iii)  the  decay  of  signal  intensity  seen  on  exposures 
made  4  weeks  later  (not  shown).  A  minor  form  of  Tpil  and  at  least 
six  minor  forms  of  Tifl  have  been  noted  in  over-expression 

experiments  (and  see  Fig.  6B);  the  position  of  the  minor  forms  are 
indicated  by  circles. 

Fig.  5B  shows  ^^P  only  labeling.  The  major  form  of  Tpil,  which 

is  not  labeled  with  ^'P,  is  indicated  by  a  circle.  The  positions  of  7 

forms  of  Tifl  are  indicated  by  circles. 

Figure  6.  Fractionation  by  Centrifugation. 

Fig.  6A  shows  the  proteins  in  the  supernatant  of  a  100,000  g, 
30  min.  spin.  Fig.  6B  shows  the  proteins  in  the  pellet  of  a  16,000  g, 
10  min.  spin.  Supernatant  fractions  examined  in  multiple 

experiments  done  over  a  wide  range  of  g  forces  looked  similar  to 
each  other,  as  did  the  pellet  fractions. 
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